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(54) High performance low cost video game system with coprocessor providing high speed 
fficient 3-D graphics and digital audio signal processing 



(57) A low cost high performance three dimensional 
(3D) graphics system is disclosed that can model a 
world in three dimensions and project the model onto a 
two dimensional viewing plane selected based on a 
changeable viewpoint. The viewpoint can be changed 
on an interactive, real time basis by operating user input 
controls such as game controllers. The system rapidly 
produces a corresponding changing image (which can 
include animated cartoon characters or other animation) 
on the screen of a color television set. 



The richly featured high performance low cost sys- 
tem is intended to give consumers the chance to interact 
in real time right inside magnificent virtual 3D worlds to 
provide a high degree of image realism, excitement and 
flexibility. An optimum feature set/architecture (including 
a custom designed graphics/audio coprocessor) pro- 
vides high quality fast moving 3D images and digital 
stereo sound for video game play and other graphics 
applications. Numerous features provide flexibility and 
capabilities in a system that is intended to be within the 
cost range of most consumers. 
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Description 

FIELD OF THE INVENTION 

The present invention relates to low cost video game systems. More particularly, the invention relates to a video 
game system that can model a world in three dimensions and project the model onto a two dimensional viewing plane 
selected based on a changeable viewpoint. 

BACKGROUND AND SUMMARY OF THE INVENTION 

People's imaginations are fueled by visual images. What we actually see at sunset, what we dream at night, the 
pictures we paint in our mind when we read a novel •- all of these memorable scenes are composed of visual images. 
Throughout history, people have tried to record these images with pencils or paints or video tape. But only with the 
advent of the computer can we begin to create images with the same vividness, detail and realism that they display in 
the real world or in the imagination. 

Computer-based home video game machines such as the Nintendo Entertainment System and the Super Nintendo 
Entertainment System have been highly successful because they can interactively produce exciting video graphics. 
However, without additional add-on hardware, these prior video graphics systems generally operated in two dimen- 
sions, creating graphics displays from flat (planar) image representations in a manner somewhat analogous to tacking 
flat paper cutouts onto a bulletin board. Although very exciting game play can be created using two dimensional graphics 
techniques, a 2D system cannot provide the realism offered by three-dimensional graphics system. 

3D graphics are fundamentally different from 2D graphics. In 3D graphics techniques, a "world" is represented in 
three dimensional space. The system can allow the user to select a viewpoint within the world. The system creates an 
image by "projecting" the world based on the selected viewpoint. The result is a true three-dimensional image having 
depth and realism. 

For many years, specialists have used super computers and high end workstations to create incredible realistic 
3D images — for example, ultra-detailed models of cars, planes and molecules; virtual reality as seen from the cockpit 
of a jet fighter or the front seat of an Olympic bobsled; and dinosaurs of "Jurassic Park." However, in the past, computer 
systems required to produce such images interactively cost tens of thousands of dollars - well beyond the reach of the 
average consumer. 

The low cost high performance 3D graphics system disclosed herein is intended to for the first time give millions 
of game players, not just the specialists, the chance to interact right inside these magnificent virtual 3D worlds with a 
richly featured high performance low cost system. What players get is truly amazing - many times the power of any 
home computer system, far more realistic 3-dimensional animation, stunning graphics - all delivered at a sufficiently 
low cost to be within the reach of the average consumer. 

The following are a few examples of the many advantageous features provided by a system in accordance with 
the present invention: 

• Realistic interactive 3D graphics in a low price system 

• Optimum feature set/architecture for a low cost system for use with a color television set to provide video game 
play and other graphics applications in a low cost system and/or to produce particular screen effects 

• Coprocessor that provides high performance 3D graphics and digital sound processing 

• Signal processor sharing between graphics digital processing and audio signal processing to achieve high quality 
stereo sound and 3-D graphics in a low cost color television based system 

• Unified RAM approach increases flexibility 

• All major system components can communicate through the shared RAM 

• Techniques/structures for compensating for narrow main memory bus width 

• Executable code from a storage device (e.g., a portable memory cartridge) can be loaded into the common RAM 
and accessed by the main processor through coprocessor memory access/arbitration circuitry 

• Graphics coprocessor loadable microcode store receives microcode from a portable storage medium to provide 
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additional flexibility and simplify compatibility issues 
Microcode is loaded via execution of "boot ROM" instructions 

Optimal commands and associated formats are used to invoke graphics and audio functions within the coprocessor 
and provide an interface between the graphics coprocessor and the rest of the system 

Coprocessor register set including particular hardware register definitions, formats and associated functions 

Microcode graphics and audio structure/processes provide efficient high performance operation 

Vector unit provides optimal performance for graphics and audio digital processing in a low cost package 

Pipelined rasterizing engine provides a one-pixel-per-cycle and two-pixel-per-cycle modes to minimize hardware 
cost while providing a rich feature set 

Low coprocessor pin out 

BRIEF DESCRIPTION OF THE DRAWINGS 

20 

These and other features and advantages of the present invention will be better and more completely understood 
by referring to the following detailed description of a presently preferred exemplary embodiment in connection with the 
drawings, of which: 

25 Figure 1 shows an overall video game system capable of generating 3-D images and digitally processed stereo 

sound; 

Figures 1 A-1F show example 3-D screen effects achievable using the Figure 1 system; 

Figure 2 shows an example of principal components of an overall video game system; 

Figure 3 shows example major processing operations of an overall video game system; 
30 Figure 4 shows example overall operation of a video game system; 

Figure 4A shows example overall steps performed by a video game system to generate graphics images; 

Figure 5 shows a detailed overall system architecture example; 

Figure 5A shows an example main processor initialization routine; 

Figure 5B shows an example main processor memory map; 
35 Figure 6 shows an example coprocessor internal architecture; 

Figure 6A shows an example coprocessor internal bus architecture; 

Figure 7 shows an example signal processor internal architecture; 

Figure 7A shows an example signal processor instruction format; 

Figure 7B shows an example slicing of the Figure 7A source or destination field for processing by the vector unit 
40 shown in Figure 7; 

Figure 7C shows an example add operation performed by the example signal processor vector unit; 
Figure 7D-7L show example signal processor registers; 

Figure 8 shows an example hierarchical task list including graphics display lists and audio play lists; 
Figure 9 shows an example microcode load routine; 
45 Figure 10 shows an example simple signal processor display list processing example; 

Figure 11 shows an example signal processor graphics microcode control step sequence; 
Figure 12A shows an example double precision representation; 
Figure 12B shows an example matrix format; 

Figure 13A shows an example signal processor vertex buffer format; 
so Figure 1 3B shows an example vertex data definition; 

Figure 13C shows an example signal processor segment addressing arrangement; 
Figure 14 shows an example audio software architecture; 

Figure 15 shows an example simple signal processor play list processing example; 
Figure 16 shows an example signal processor audio microcode control step sequence; 
5 -5 Figure 17 shows an example signal processor audio processing construct: 

Figure 18 shows example overall display processor processing steps; 
Figures 19A and 19B show example display processor pipeline configurations, 
Figure 20 shows an example display processor architecture; 
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Figures 21 A-21J show example display processor registers; 

Figure 22 shows an example texture memory tile descriptor arrangement; 

Figure 23 shows an example texture unit process; 

Figure 24 shows an example texture coordinate unit and texture memory unit architecture; 
Figure 25 shows an example texture memory color index mode lookup; 

Figure 26 shows an example more detailed use of the texture memory to store color indexed textures; 

Figure 27 shows an example color combiner operation; 

Figure 28 shows an example alpha combiner operation; 

Figure 29 shows an example alpha fix up operation; 

Figure 30 shows an example of blending different types of primitives; 

Figure 31 shows an example blender operation; 

Figure 32 shows an example color pixel format; 

Figure 33 shows an example depth (z) pixel format, 

Figure 33A shows an example write enable generation process; 

Figure 34 shows an example video interface architecture; 

Figure 34A shows an example video interlace operating sequence; 

Figures 35A-35P Show example video interface control registers, 

Figure 36 shows an example main memory interface architecture; 

Figures 37A-37H show example memory interface controller registers; 

Figure 38 shows an example main processor interface architecture; 

Figures 39A-39D show example main processor interface registers; 

Figure 40 shows an example audio interface architecture; 

Figures 41 A-41F show example audio interface registers; 

Figure 42 shows an example serial interface architecture: 

Figures 43A-43D show example serial interface registers; 

Figure 44 shows an example peripheral interface architecture; and 

Figures 45A-45I show example peripheral interface control/status registers. 

DETAILED DESCRIPTION OF A PRESENTLY PREFERRED EXAMPLE EMBODIMENT 

Figure 1 shows an example embodiment video game system 50 in accordance with the present invention(s). Video 
game system 50 in this example includes a main unit 52. a video game storage device 54, and handhold controllers 
56 (or other user input devices). In this example, main unit 52 connects to a conventional home color television set 58. 
Television set 58 displays 3D video game images on its television screen 60 and reproduces stereo sound through its 
loud speakers 62. 

In this example, the video game storage device 54 is in the form of a replaceable memory cartridge insertable into 
a slot 64 on a top surface 66 of main unit 52. Video game storage device 54 can comprise, for example, a plastic 
housing 68 encasing a read only memory (ROM) chip 76 The read only memory 76 contains video game software in 
this example. When the video game storage device 54 is inserted into main unit slot 64, cartridge electrical contacts 
74 mate with corresponding "edge connector" electrical contacts within the main unit. This action electrically connects 
the storage device's read only memory 76 to the electronics within main unit 52. 

"Read only memory" chip 76 stores software instructions and other information pertaining to a particular video 
game. The read only memory chip 76 in one storage device 54 may, for example, contain instructions and other infor- 
mation for an adventure game. The read only memory chip 76 in another storage device 54 may contain instructions 
and information to play a driving or car race game. The read only memory chip 76 of still another storage device 54 
may contain instructions and information for playing an educational game. To play one game as opposed to another, 
the user of video game system 50 simply plugs the appropriate storage device 54 into main unit slot 64-thereby con- 
necting the storage device's read only memory chip 76 (and any other circuitry the storage device may contain) to the 
main unit 52. This enables the main unit 52 to access the information contained within read only memory 76, which 
information controls the main unit to play the appropriate video game by displaying images and reproducing sound on 
color television set 58 as specified under control of the video game software in the read only memory. 

To play a video game using video game system 50, the user first connects main unit 52 to his or her color television 
set 58 by hooking a cable 78 between the two. Main unit 52 produces both Video" signals and "audio" signals for 
controlling color television set 58 The "video" signals are what controls the images displayed on th television scr en 
60, and the "audio" signals are played back as sound through television loudspeakers 62. Depending on the type of 
color television set 58, it may be necessary to use an additional unit called an "RF modulator" in line between main 
unit 52 and color television set 58. An "RF modulator" (not shown) converts the video and audio outputs of main unit 
52 into a broadcast type television signal (e g , on television channel 2 or 3) that can be received and processed using 
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the television set's internal "tuner" 

The user also needs to connect main unit 52 to a power source. This power source may comprise a conventional 
AC adapter (not shown) that plugs into a standard home electrical wall socket and converts the house current into a 
lower DC voltage signal suitable for powering main unit 52. 

The user may then connect hand controllers 56a, 56b to corresponding connectors 80 on main unit front panel 82. 
Controllers 56 may take a variety of forms. In this example, the controllers 56 shown each include various push buttons 
84 and a directional switch or other control 86. The directional switch 88 can be used, for example, to specify the 
direction (up, down, left or right) that a character displayed on television screen 60 should move and/or to specify a 
point of view in a 3D world Other possibilities include, for example, joysticks, mice pointer controls and other conven- 
tional user input devices. In this example, up to four controllers 56 can be connected to main unit 52 to allow 4-player 
games. 

The user then selects a storage device 54 containing the video game he or she wants to play, and inserts that 
storage device into main unit slot 64 (thereby electrically connecting read only memory 76 to the main unit electronics 
via a printed circuit board 70 and associated edge contacts 74). The user may then operate a power switch 88 to turn 
on the video game system 50 This causes main unit 52 to begin playing the video game based on the software stored 
in read only memory 54. He or she may operate controllers 86 to provide inputs to main unit 52 and thus affect the 
video game play. For example, depressing one of push buttons 84 may cause the game to start. As mentioned before, 
moving directional switches 86 can cause animated characters to move on the television screen 60 in different directions 
or can change the user's point of view in a 3D world. Depending upon the particular video game stored within the 
storage device 54, these various controls 84, 86 on the controller 56 can perform different functions at different times. 
If the user wants to restart game play, he or she can press a reset button 90. 

EXAMPLE 3D SCREEN EFFECTS 

System 50 is capable of processing, interactively in real time, a digital representation or model of a three-dimen- 
sional world to display the world (or portions of it) from any arbitrary viewpoint within the world. For example, system 
50 can interactively change the viewpoint in response to real time inputs from game controllers 86. This can permit, 
for example, the game player to see the world through the eyes of a Virtual person" who moves through the world, 
and looks and goes wherever the game player commands him or her to go This capability of displaying quality 3D 
images interactively in real time can create very realistic and exciting game play 

Figures 1A-1F show just one example of some three-dimensional screen effects that system 50 can generate on 
the screen of color television set 58. Figures 1 A-1F are in black and white because patents cannot print in color, but 
system 50 can display these different screens in brilliant color on the color television set. Moreover, system 50 can 
create these images very rapidly (e g , seconds or tenths of seconds) in real time response to operation of game 
controllers 86. 

Each of Figures 1 A-1F was generated using a three-dimensional model of a "world" that represents a castle on a 
hilltop. This model is made up of geometric shapes (j.e., polygons) and "textures" (digitally stored pictures) that are 
"mapped" onto the surfaces defined by the geometric shapes. System 50 sizes, rotates and moves these geometric 
shapes appropriately, "projects" them, and puts them all together to provide a realistic image of the three-dimensional 
world from any arbitrary viewpoint. System 50 can do this interactively in real time response to a person's operation 
of game controllers 86. 

Figures 1 A- 1 C and 1 F show aerial views of the castle from four different viewpoints. Notice that each of the views 
is in perspective. System 50 can generate these views (and views in between) interactively in a matter of seconds with 
little or no discernible delay so it appears as if the video game player is actually flying over the castle. 

Figures 1 D and 1 E show views from the ground looking up at or near the castle main gate. System 50 can generate 
these views interactively in real time response to game controller inputs commanding the viewpoint to "land" in front 
of the castle, and commanding the "virtual viewer" (i.e., the imaginary person moving through the 3-D world through 
whose eyes the scenes are displayed) to face in different directions. Figure 1 D shows an example of texture mapping" 
in which a texture (picture) of a brick wall is mapped onto the castle walls to create a very realistic image. 

Overall Video Game System Electronics 

Figure 2 shows that the principal electronics within main unit 52 includes a main processor 100, a coprocessor 
200, and main memory 300. Main processor 100 is a computer that runs the video game program provided by storage 
device 54 based on inputs provided by controllers 56. Coprocessor 200 generates images and sound based on in- 
structions and commands it gets from main processor 100. Main memory 300 is a fast memory that stores the infor- 
mation main processor 100 and coprocessor 200 need to work, and is shared between the main processor and the 
coprocessor. In this example, all accesses to main memory 300 are through coprocessor 200 
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In this example, the main processor 1 00 accesses the video game program through coprocessor 200 over a com- 
munication path 102 between the main processor and the coprocessor 200. Mam processor 100 can read from storage 
device 54 via another communication path 104 between the coprocessor and the video game storage device. The 
main processor 100 can copy the video game program from the video game storage device 54 into main memory 300 
5 over path 106, and can then access the video game program in main memory 300 via coprocessor 200 and paths 1 02, 
106. 

Main processor 100 generates, from time to time, lists of commands that tell the coprocessor 200 what to do. 
Coprocessor 200 in this example comprises a special purpose high performance application-specific integrated circuit 
(ASIC) having an internal design that is optimized for rapidly processing 3-D graphics and digital audio. In response 
fo to commands provided by main processor 1 00 over path 102, coprocessor 200 generates video and audio for appli- 
cation to color television set 58. The coprocessor 20O uses graphics, audio and other data stored within main memory 
300 and/or video game storage device 54 to generate images and sound. 

Figure 2 shows that coprocessor 200 in this example includes a signal processor 400 and a display processor 
500. Signal processor 400 is an embedded programmable microcontroller that performs graphics geometry processing 
is and audio digital signal processing under control of a "microcode" computer program supplied by video game storage 
device 54. Display processor 500 is a high speed state machine that renders graphics primitives, thereby creating 
images for display on television 58. The signal processor 400 and display processor 500 work independently, but the 
signal processor can supervise the display processor by sending graphics commands to it. Both signal processor 400 
and display processor 500 can be controlled directly by main processor 100. The following are examples of functions 
20 and operations the signal processor 400 and display processor 500 can perform; 

SIGNAL PROCESSOR 

Matrix control 
25 • 3D transformations 
Lighting 

Clipping, perspective and viewport application 
Display processor command generation 

30 DISPLAY PROCESSOR 

Rasterization 

Texture coordinate generation 
Texlure application and filtering 
Color combining 
Blending 
Fogging 
Antialiasing 

Frame buffer and frame buffer control 

40 

Figure 3 shows the main processes performed by the main processor 100, coprocessor 200 and main memory 
300 in this example system 50. The main processor 100 receives inputs from the game controllers 56 and executes 
the video game program provided by storage device 54 to provide game processing (block 1 20). It provides animation, 
and assembles graphics and sound commands for use by coprocessor 200. The graphics and sound commands gen- 

45 erated by main processor 100 are processed by blocks 1 22, 1 24 and 1 26-each of which is performed by coprocessor 
200. In this example, the coprocessor signal processor 400 performs 3D geometry transformation and lighting process- 
ing (block 122) to generate graphics display commands for display processor 500. Display processor 500 "draws 1 ' 
graphics primitives (e.g., lines, triangles and rectangles) tocreate an image for display on color TV 58. Display processor 
500 performs this "drawing" or rendering function by "rasterizing" each primitive and applying a texture to it if desired 

50 (block 126). It does this very rapidly-e.g., on the order of many millions of "pixels" (color television picture elements) 
a second. Display processor 500 writes its image output into a frame buffer in main memory 300 (block 128). This 
frame buffer stores a digital representation of the image to be displayed on the television screen 60. Additional circuitry 
within coprocessor 200 reads the information from the frame buffer and outputs it to television 58 for display (block 1 30). 
Signal processor 400 also processes sound commands received from main processor 100 using digital audio 

55 signal processing techniques (block 124) Signal processor 400 writes its digital audio output into a sound buffer in 
main memory 300. The main memory temporarily 'buffers" (i.e., stores) the sound output (block 132) Other circuitry 
in coprocessor 200 reads this buffered sound data from main memory 300 and converts it into electrical audio signals 
(stereo left and right channels) for application to and reproduction by television speakers 62a, 62b (block 134). 
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Television 58 displays 30 or 60 new images a second. This "frame rate" fools the human eye into seeing continuous 
motion, allowing mam unit 52 to create animation effects on television screen 60 by changing the image slightly from 
one frame to the next. To keep up with this television frame rate, coprocessor 200 must create a new image every 1/30 
or 1/60 of a second. Coprocessor 200 must also be able to produce a stream of continuous sound to go along with the 
5 animation effects on screen 60. 

Ov rail System Operation 

Figure 4 shows the overall operation of system 50 in more detail, and Figure 4A shows overall steps performed 
io by the system to generate graphics In this example, main processor 100 reads a video game program 108 stored in 
main memory 300 (generally, this video game program will have originated in video game storage device 54 and have 
been copied from the video game storage device into the main memory). In response to executing this video game 
program 108 (and in response to inputs from game controllers 56), main processor 100 creates (or reads from storage 
device 58) a list 110 of commands for coprocessor 200 (Figure 4A : block 120a). This list 110, in general, includes two 
is kinds of commands: 

(1 ) graphics commands 

(2) audio commands. 

20 Graphics commands tell coprocessor 200 what images to generate on TV screen 60. Audio commands tell sound 
coprocessor 200 what sounds it should generate for reproduction on TV loudspeakers 62. 

The list of graphics commands is called a "display list" because it controls the images coprocessor 200 displays 
on the TV screen 60. The list of audio commands is called a "play list" because it controls the sounds that are played 
over loudspeaker 62. Generally, main processor 100 specifies both a new display list and a new play list for each video 

25 "frame" time of color television set 58. 

In this example, main processor 100 provides its display/play list 110 to coprocessor 200 by storing it into main 
memory 300 and then telling the coprocessor where to find it (Figure 4 A, block 1 20c), Main processor 1 00 also makes 
sure the main memory 300 contains a graphics and audio database 112 that includes all of the data coprocessor 200 
will need to generate the graphics and sound requested in the display/play list 110. Some or all of this graphics and 

30 audio database 112 can come from storage device 54, The display/play list 110 specifies which portions of graphics 
and audio database 112 the coprocessor 200 should use. Main processor 100 also is responsible for making sure that 
signal processor 400 has loaded "microcode"-i.e., a computer program that tells the signal processor what to do. 

Signal processor 400 reads the display/play list 110 from main memory 100 (Figure 4A, block 122a) and processes 
this list-accessing additional data within the graphics and audio database 112 as needed (Figure 4A, block 122b). 

3S Signal processor 400 generates two main outputs: graphics display commands 112 for further processing by display 
processor 500 (Figure 4A, block 122c); and audio output data 114 for temporary storage within main memory 300. 
Signal processor 400 processes the audio data in much less than the time it takes to play the audio through loudspeak- 
ers 62. Another part of the coprocessor 200 called an "audio interface" (not shown) subsequently reads the buffered 
audio data and outputs it in real time for reproduction by television loudspeakers 62. 

40 The signal processor 400 can provide the graphics display commands 11 2 directly to display processor 500 over 

a path internal to coprocessor 200, or it may write those graphics display commands into main memory 300 for retrieval 
by the display processor (not shown) These graphics display commands 1 1 2 command display processor 500 to draw 
("render") specified geometric shapes with specified characteristics (Figure 4a, block 126a). For example, display 
processor 500 can draw lines, triangles or rectangles (polygons) based on these graphics display commands 112, and 

45 may fill triangles and rectangles with particular colors and/or textures 116 (e.g., images of leaves of a tree or bricks of. 
a brick wall)-all as specified by the graphics display commands 112. Main processor 100 stores the texture images 
116 into main memory 300 for access by display processor 500. It is also possible for main processor 100 to write 
graphics display commands 112 directly into main memory X0 for retrieval by display processor 500 to directly com- 
mand the display processor. 

so Display processor 500 generates, as its output, a digitized representation of the image that is to appear on television 

screen 60 (Figure 4A, block 126b). This digitized image, sometimes called a "bit map," is stored within a frame buffer 
1 18 residing in main memory 300 Display processor 500 can also store and use a depth (Z) buffer 118b in main memory 
300 to store depth information for the image. Another part of coprocessor 200 called the "video interlace" (not shown) 
reads the frame buffer 1 1 8 and converts its contents into video signals for application to color television set 58 (Figure 

ss 4a, block 1 27) Typically, frame buffer 1 1 8 is "double buffered, " meaning that coprocessor 200 can be writing the "next" 
image into half of the frame buffer while the video interface is reading out the other half. 

The various steps shown in Figure 4A and described above are "pipelined" in this example "Pipelining' means 
that different operations are performed concurrently for different stages in the graphics generation process A simple 
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analogy is the way most people do laundry. A non-pipeline mode of doing laundry would involve completing all relevant 
tasks (washing, drying, ironing/folding, and putting away) for one load of laundry before beginning the next load. To 
save time, people with multiple loads of laundry "pipeline" the laundry process by performing washing, drying, ironing/ 
folding and. putting away operations concurrently for different loads of laundry. 

5 Similarly, the operations performed by main processor 100, signal processor 400, display processor 500 and video 

interface 210 are "pipelined" in this example. For example, main processor 100 in this example can be assembling a 
display list two video frames ahead while signal processor 400 and display processor 500 are processing data for one 
video frame ahead and video interface 210 is processing data for the current video frame in progress. As is explained 
below, the detailed graphics rendering steps performed by display processor 500 in block 126a are also pipelined to 

10 maximize speed performance. 

More Detailed System Architecture 

Figure 5 shows a more detailed architecture of video game system 50. This diagram shows video game main unit 
1$ 52 including, in addition to main processor 100, coprocessor 200 and main memory 300, additional components such 
as a clock generator 136, a serial peripheral interface 138, an audio digital-to-analog converter (DAC) 140, an audio 
amplifier/mixer 142, a video digital-to-analog converter 144, and a video encoder 146. 

In this example, the clock generator 136 (which may be controlled by a crystal 148) produces timing signals to 
time and synchronize the other components of main unit 52. Different main unit components require different clocking 
20 frequencies, and clock generator 136 provides suitable such clock frequency outputs (or frequencies from which suit- 
able clock frequencies can be derived such as by dividing). A timing block 21 6 within coprocessor 200 receives clocking 
signals from clock generator 136 and distributes them (after appropriate dividing as necessary) to the various other 
circuits within the coprocessor. 

In this example, the game controllers 58 are not connected directly to main processor 100, but instead are con- 
25 nected to main unit 52 through serial peripheral interface 1 36. Serial peripheral interface T38 demultiplexes serial data 
signals incoming from up to four (or five) game controllers 56 (or other serial peripheral devices) and provides this data 
in a predetermined format to main processor 100 via coprocessor 200. Serial peripheral interface 1 38 is bidirectional 
in this example, i.e., it is capable of transmitting serial information specified by main processor 100 in addition to 
receiving serial information. 

30 Serial peripheral interlace 138 in this example also includes a "boot ROM" read only memory 150 that stores a 

small amount of initial program load (I PL) code. This I PL code stored within boot ROM 150 is executed by main proc- 
essor 100 at time of startup and/or reset to allow the main processor to begin executing game program instructions 
108a within storage device 54 (see Figure 5A, blocks 160a, 160b). The initial game program instructions 108a may, 
in turn, control main processor 100 to initialize the drivers and controllers it needs to access main memory 300 (see 

35 Figure 5A, blocks 160c, 160d) and to copy the video game program and data into the faster main memory 300 for 
execution and use by main processor 100 and coprocessor 200 (see Figure 5A, blocks 160e, 160f t 160g). 

Also in this example, serial peripheral interface 138 includes a security processor (e.g., a small microprocessor) 
that communicates with an associated security processor 152 (e.g.. another small microprocessor) within storage 
device 54 (see Figure 5). This pair of security processors (one in the storage device 54, the other in the main unit 52) 

40 perform an authentication function to ensure that only authorized storage devices may be used with video game main 
unit 52. See U.S. Patent No. 4,799,635. In this example, the security processor within serial peripheral interface 138 
may process data received from game controllers 56 under software control in addition to performing a security function 
under software control. 

Figure 5 shows a connector 154 within video game mam unit 52. This connector 154 connects to the electrical 
45 contacts 74 at the edge of storage devce printed circuit board 70 in this example (see Figure 1 ). Thus, connector 1 54 
electrically connects coprocessor 200 to storage device ROM 76. Additionally, connector 154 connects the storage 
device security processor 152 to the main unit's serial peripheral interface 1 38. Although connector 154 in the particular 
example is used primarily to read data and instructions from a non-writable read only memory 76, system 52 is designed 
so that the connector is bidirectional, i.e., the main unit can send information to the storage device 54 in addition to 
so reading information from it. 

Figure 5 also shows that the audio and video outputs of coprocessor 200 are processed by some electronics 
outside of the coprocessor before being sent to television set 58. In particular, in this example coprocessor 200 outputs 
its audio and video information in digital form, but conventional home color television sets 58 generally require analog 
audio and video signals. Therefore, the digital outputs of coprocessor 200 are converted into analog forrn-a function 
ss performed for the audio information by DAC 1 40 and for the video information by VDAC 1 44. The analog audio output 
of DAC 140 is amplified by an audio amplifier 142 that may also mix audio signals generated externally of main unit 
52 and supplied through connector 1 54 The analog video output of VDAC 1 44 is provided to video encoder 1 46, which 
may. for example, convert "RGB" input signals to composite video outputs. The amplified stereo audio output of amplifier 
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142 and the composite video output of video encoder 146 are provided to home color television set 58 through a 
connector not shown. 

As shown in Figure 5, main memory 300 stores the video game program in the torm ot CPU instructions 108b. 
These CPU instructions 1 08b are typically copied from storage device 54. Although CPU 1 00 in this example is capable 

5 of executing instructions directly out of storage device ROM 76, the amount of time required to access each instruction 
from the ROM is much greater than the time required to access instructions from main memory 300. Therefore, main 
processor 100 typically copies the game program/data 108a from ROM 76 into main memory 300 on an as-needed 
basis in blocks, and accesses the main memory in order to actually execute the instructions (see Figure 5A, blocks 
I60e, 160f). The main processor 100 preferably includes an internal cache memory to further decrease instruction 

to access time. 

Figure 5 shows that storage device 54 also stores a database of graphics and sound data 112a needed to provide 
the graphics and sound of the particular video game. Main processor 100 reads the graphics and sound data 112a 
from storage device 54 on an as-needed basis and stores it into main memory 300 in the form of texture data 116, 
sound data 112b and graphics data 112c. In this example, display processor 500 includes an internal texture memory 

75 502 into which the texture data 116 is copied on an as-needed basis for use by the display processor. 

Storage device 54 also stores coprocessor microcode 156. As described above, in this example signal processor 
400 executes a computer program to perform its various graphics and audio functions. This computer program or 
"microcode," is provided by storage device 54. Because the microcode 156 is provided by storage device 54, different 
storage devices can provide different microcodes-thereby tailoring the particular functions provided by coprocessor 

20 200 under software control. Typically, main processor 100 copies a part of the microcode 156 into main memory 300 
whenever it starts the signal processor, and the signal processor 400 then accesses other parts of the microcode on 
an as-needed basis The signal processor 400 executes the microcode out of an instruction memory 402 within the 
signal processor 400. Because the SP microcode 156 may be too large to fit into the signal processor's internal in- 
struction memory 402 all at once, different microcode pordons may need to be loaded from main memory 300 into the 

25 instruction memory 402 to allow signal processor 400 to perform different tasks. For example, one part of the SP 
microcode 156 may be loaded into signal processor 400 for graphics processing, and another part of microcode may 
be loaded into the signal processor for audio processing. In this example, the signal processor microcode RAM 402 
(and an additional signal processor data memory RAM not shown in Figure 5) is mapped into the address space of 
main processor 100 so the main processor can directly access the RAM contents under software control through load 

30 and store instructions. 

Main Processor 100 

Main processor 100 in this example is a MIPS R4300 RISC microprocessor designed by MIPS Technologies, Inc., 

3S Mountain View, California. This R4300 processor includes an execution unit with a 64-bit register file for integer and 
floating-point operations, a 16 KB Instruction Cache, a 8 KB Write Back Data Cache, and a 32-entry TLB for virtual- 
to-physical address calculation. The main processor 100 executes CPU instructions (e g., a video game program) 108 
in kernel mode with 32-bit addresses. 64-bit integer operations are available in this mode, but 32-bit calling conventions 
are preferable to maximize performance. For more information on main processor 100, see, for example, Heinrich, 

40 MIPS Microprocessor R4000 User's Manual (MIPS Technologies, Inc., 1994, Second Ed.). 

Main processor 100 communicates with coprocessor 200 over bus 102, which in this example comprises a bi- 
directional 32-bit SysAD multiplexed address/data bus, a bi-directional 5-bit wide SysCMD bus, and additional control 
and timing lines. See chapter 12 et seq. of the above-mentioned Heinrich manual. 

The conventional R4300 main processor supports six hardware interrupts, one internal (timer) interrupt, two soft- 

45 ware interrupts, and one non-maskable interrupt (NMI). In this example, three of the six hardware interrupt inputs (INTO, 
INT1 and INT2) and the non-maskable interrupt (NMI) input allow other portions of system 50 to interrupt the main 
processor. Specifically, main processor INTO is connected to allow coprocessor 200 to interrupt the main processor, 
main processor interrupt INT1 is connected to allow storage device 54 to interrupt the main processor, and main proc- 
essor interrupts INT2 and NMI are connected to allow the serial peripheral interface 1 38 to interrupt the main processor. 

so Any time the processor is interrupted, it looks at an internal interrupt register to determine the cause of the interrupt 
and then may respond in an appropriate manner (e.g., to read a status register or perform other appropriate action). 
All but the NMI interrupt input from serial peripheral interface 138 are maskable (i.e., the main processor 100 can 
selectively enable and disable them under software control). 

Main processor 100 reads data from and writes data to the rest of system 50 via the CPU-to-coprocessor bus 102. 

55 The coprocessor 200 performs a memory mapping function, allowing the main processor 1 00 to address main memory 
300, the storage device cartridge ROM 76, the "boot ROM" 150 within serial peripheral interface 1 38 (and other parts 
of the serial peripheral interface), various parts of coprocessor 200 (including signal processor RAM 402), and other 
parts of system 50. 
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In the example, the operations performed by main processor 1 00 are completely dependent on videogame program 
108 In this example, all "system" software is supplied by the storage device 58 to provide maximum flexibility. Different 
video games (or other applications) may run more efficiently with different kinds of high level software. Therefore, main 
unit 52 in this example does not provide any standard software libraries or any software at all for that matter -- since 
s such libraries could limit flexibility. Instead, all software in this example is supplied by storage device 54. 

Developers of video game software 1 08 may wish to employ advanced software architecture such as, for example, 
device drivers, schedulers and thread libraries to manage the various resources within system 50. Since main processor 
100 is a state-of-the-art RISC processor/computer, it is appropriate to use such software architecture/constructs and 
to implement video game program 108 in a high level software environment. 
io An example system "memory map" of the main processor 100 address space is shown in Figure 5B. As shown in 

this Figure 5B, main memory 300 is divided into two banks (bank 0 and bank 1 ) in this example. In addition, certain 
configuration registers 307 within the main memory 300 are mapped into the main processor address space, as are 
registers within coprocessor 200. Main processor 100 in this example can control each of the various coprocessor 
subblocks by writing, under control of video game program 108, into control registers associated with each coprocessor 
1$ 200 sub-block 

As shown in Figure 5B, storage device 54 address space is divided into two "domains" (for two different devices, 
for example). These 'domains* are mapped into several parts ot the main processor 100 address space. Various parts 
of the serial peripheral interface 138 (i.e., PIF boot ROM 150, a PIF buffer RAM, and a PIF status register) are also 
mapped into the main processor 100 address space. 

20 

Unified Main Memory 300 

Main memory 300 in this example comprises a RDRAM dynamic random access memory available from Rambus 
Inc. of Mountain View, California. In this example, main memory 300 is expandable to provide up to 8 megabytes of 
25 storage, although main unit 52 may be shipped with less RAM (e.g., 2 or 3 MB) to decrease cost 

Main memory 300 provides storage for the entire system 50 in this example. It provides a single address space 
(see Figure 5B above) for storing all significant data structures, including for example (as shown in Figure 5): 

• Main processor instructions 108 
30 • Signal processor microcode 156 

• Display list graphic commands 110a 

• Play list audio commands 1 10b 

• Texture maps 116 and other graphics data 112c 

• Color image frame buffer 118a 
35 • Depth (z) buffer 118b 

• sound data 112b 

• Audio output buffer 114 

• Main processor working values 

• Coprocessor working values 

40 • Data communicated between various parts of the system. 

Advantages and disadvantages in using single address space memory architectures for raster scan display systems 
are known (see, for example, Foley et al, Computer Graphics: Principles and Practice at 177-178 (2d Ed. Addison- 
Wesley 1990). Many video game (and other graphics) system architects in the past rejected a single address space 

^5 architecture in favor of using dedicated video RAM devices for graphics data and using other types of memory devices 
for other types of data. However, a unified main memory 300 provides a number of advantages in this particular example 
of a video game system 50. For example: 

Data communications between system elements is simplified . Once data is stored in main memory 300, there is 
little or no additional overhead in communicating the data to another part of the system. The overhead of transferring 

so data between different parts of the system is thus minimized For example, since the main processor 100 and each 
sub-block within the coprocessor 200 can each access system main memory 300, the main memory used by all system 
elements for data structure storage can also be used as a general purpose communication channel/data buffer between 
elements. 

For example, display lists 110 main processor 100 stores within main memory 300 can be directly accessed by 
55 signal processor 400. Similarly, display commands the main processor (and/or the signal processor) stores within the 
main memory can be directly accessed by display processor 500. The main processor 100 working data (which can 
automatically be written into the main memory 300 via a "cache flush") is immediately available to all other parts of the 
system. 
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The unified memory provides memory allocation flexibility . Main memory 300 locations look alike, and therefore 
each location can be used for storing any type of data structure All main memory 300 allocation decisions are left to 
the application programmer. This provides great flexibility in terms of data structure sizes and memory usage. Data 
structures can be stored anywhere in main memory 300, and each location in memory 300 can be allocated however 

5 the application programmer specifies. 

For example, one video game programmer might provide a large frame buffer for high resolution images and/or 
image scrolling and panning, while another programmer may decide to use a smaller frame buffer so as to free up 
memory space for other data structures (e.g., textures or audio data). One application may devote more of main memory 
300 storage for audio data structures and less to graphics data, while another application may allocate most of the 

10 storage for graphics related data. The same video game program 108 can dynamically shift memory allocation from 
one part of game play to another (e.g., at the time the game changes levels) to accomplish different effects. Application 
flexibility is not limited by any fixed or hardwired memory allocation. 

The Unified RAM architecture supports flexible data structure sharing and usage . Since all significant data struc- 
tures are stored within common main memory 300, they can all be accessed by main processor 100 and other system 

is elements. There is no hardware distinction between display images and source images. For example, main processor 
100 can, if desired, directly access individual pixels within frame buffer 118. The scan conversion output of display 
processor 500 can be used as a texture for a texture mapping process. Image source data and scan converted image 
data can be interchanged and/or combined to accomplish special effects such as, for example, warping scan-converted 
images into the viewpoint. 

20 The shortcomings of a unified memory architecture (e.g. , contention for access to the main memory 300 by different 

parts of the system) have been minimized through careful system design. Even though main memory 300 is accessed 
over a single narrow (9-bit-wide) bus 1 06 in this example, acceptable bandwidth has been provided by making the bus 
very fast (e.g., on the order of 240 MHz). Data caches are provided throughout the system 50 to make each subcom- 
ponent more tolerant to waiting for main memory 300 to become available. 

25 

Coprocessor 200 

Figure 5 shows that coprocessor 200 includes several components in addition to signal processor 400 and display 
processor 500, namely 

30 

• CPU interface 202, 

• a serial interface 204, 

• a parallel peripheraTinterface 206, 

• an audio interface 208, 
35 • a video interface 210, 

• a main memory DRAM controller/interface 212, 

• a main internal bus 214 and 

• a timing block 216. 

40 in this example, main bus 214 allows each of the various main components within coprocessor 200 to communicate 
with one another. 

Figure 6, a more detailed diagram of coprocessor 200, shows that the coprocessor is a collection of processors, 
memory interfaces and control logic all active at the same time and operating in parallel. The following briefly describes 
the overall functions provided by each of these other sub-blocks of coprocessor 200: 

45 

• Signal processor 400 is a microcoded engine that executes audio and graphics tasks. 

• Display processor 500 is a graphics display pipeline that renders into frame buffer 118 

• Coprocessor serial interface 204 provides an interface between the serial peripheral interface 1 28 and coprocessor 
200 in this example. 

so • Coprocessor parallel peripheral interface 206 interfaces with the storage device 54 or other parallel devices con- 
nected to connector 1 54. 

• Audio interface 208 reads information from audio buffer 114 within main memory 300 and outputs it to audio DAC 
140. 

• Coprocessor video interface 210 reads information from frame buffer 118a within main memory 300 and outputs 
£5 it to video DAC 14^. 

• The CPU interface 202 is the gateway between main processor 100, coprocessor 200 and the rest of system 50. 

• DRAM controller/interface 21 2 is the gateway through which coprocessor 200 (and main processor 1 00) accesses 
main memory 300. Memory interface 212 provides access to main memory 300 for main processor 100, signal 
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processor 400, display processor 500, video interface 210, audio interface 208, and serial and parallel interfaces 
204, 206 

Each of these various processors and interfaces may be active at the same time. 

5 Signal processor 400 in this example includes the instruction memory 402 discussed above, a data memory 404, 

a scalar processing unit 410 and a vector processing unit 420. Instruction memory 402 stores microcode for execution 
by scalar unit 410 and/or vector unit 420. Data memory 404 stores input data, work data and output data for the scalar 
unit 410 and for the vector unit 420. Signal processor 400 can execute instructions only out of instruction memory 402 
in this example, but has access to main memory 300 via direct memory accessing (DMA) techniques. 

10 In this example, scalar unit 410 is a general purpose integer processor that executes a subset of the MIPS R4000 

instruction set. It is used to perform general purpose operations specified by microcode within instruction memory 402, 
Vector unit 420 comprises eight 16-bit calculating elements capable of performing numerical calculations in parallel. 
Vector unit 420 is especially suited for graphics matrix calculations and certain kinds of digital audio signal processing 
operations. 

is Display processor 500 in this example is a graphics display pipelined engine that renders a digital representation 

of a display image. It operates based on graphics display commands generated by the signal processor 400 and/or 
main processor 100 Display processor 500 includes, in addition to texture memory 502, a rasterizer 504, a texture 
unit 506, a color combiner 508, a blender 510 and a memory interface 512. Briefly, rasterizer 504 rasterizes polygon 
(e.g., triangle, and rectangle) geometric primitives to determine which pixels on the display screen 60 are within these 

20 primitives. The texture unit can apply texture maps stored within texture memory 502 onto textured areas defined by 
primitive edge equations solved by rasterizer 504. The color combiner 508 combines and interpolates between the 
texture color and a color associated with the graphic primitive. Blender 510 blends the resulting pixels with pixels in 
frame buffer 118 (the pixels in the frame buffer are accessed via memory interface 512) and is also involved in per- 
forming Z buffering (i.e., for hidden surface removal and anti-aliasing operations). Memory interface 512 performs read, 

2S modify and write operations for the individual pixels, and also has special modes for loading/copying texture memory 
502, filling rectangles (fast clears), and copying multiple pixels from the texture memory 502 into the frame buffer 118. 
Memory interface 512 has one or more pixel caches to reduce the number of accesses to main memory 300. 

Display processor 500 includes circuitry 514 that stores the state of the display processor. This state information 
is used by the rest of display processor 500 to, for example, select rendering modes and to ensure that all previous 

30 rendering effected by a mode change occurs before the mode change is implemented 

The command list for display processor 500 usually comes directly from signal processor 400 over a private "X 
bus" 218 that connects the signal processor to the display processor. More specifically, X-bus 218 in this example is 
used to transfer graphics display commands from the signal processor data memory 404 into a command buffer (not 
shown in Figure 6) within display processor 500 for processing by the display processor. However, in this example it 

35 is also possible for signal processor 400 and/or main processor 100 to feed graphics display commands to display 
processor 500 via main memory 300. 

Display processor 500 accesses main memory 300 using physical addresses to load its internal texture memory 
502, read frame buffer 118 for blending, read the Z buffer 118B for depth comparison, to write to the Z-buffer and the 
frame buffer, and to read any graphics display commands Stored in the main memory. 

40 

Coprocessor Internal Bus Architecture 

Figure 6 A is a more detailed diagram showing an example coprocessor bus 21 4 arrangement, which in this example 
comprises a 32 -bit address ("CJbus 21 4C and a 64-bit data ("D") bus214D. These busses 21 4C, 21 4D are connected 
45 to each of signal processor 400, display processor 500, CPU interface 202- audio interface 208, video interface 210, 
serial interface 204, parallel peripheral interface 206, and main memory (RAM) interface 21 2. As shown in Figure 6A, 
main processor 100 and each of the sub-blocks of coprocessor 200 communicates with main memory 300 via internal 
coprocessor busses 21 4C, 21 4D, and main memory interface/controller 212a/212b. 

In this example, main memory interface/controller 212a, 21 2b converts main memory addresses asserted on co- 
st? processor address bus 21 4C into 9-bit-wide format for communication over the 9-bit-wide main memory multiplexed 
add ress/data bus 1 06, and also converts between the main memory bus 1 06 9-bit-wide data format and the coprocessor 
data bus 21 4D 64-bit wide data format. In this example, the DRAM controller/interface 212 includes, as a part thereof, 
a conventional RAM controller 21 2b (see Figure 6A) provided by Rambus Inc. The use of a 9-bit-wide main memory 
bus 106 reduces the chip pin count of coprocessor 200. 
55 in this example, each of the coprocessor 200 sub-blocks shown has an associated direct memory access (DMA) 

circuit that allows it to independently address and access main memory 300. For example, signal processor DMA circuit 
454, display processor DMA circuit 518, audio interface DMA circuit 1200, video interface DMA circuit 900, serial 
interface DMA circuit 1300. and parallel peripheral interface DMA circuit 1400 each allow their associated coprocessor 
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sub-block to generate-addresses on coprocessor address bus 21 4C and to communicate data via coprocessor data 
bus 21 4D {additionally, display processor 500 has a further memory interface block 51 2 for access to the main memory 
frame buffer 118 and texture data 116) 

Although each of the coprocessor 200 sub-blocks can independently access main memory 300. they all share 

5 common busses 21 4C, 21 4D in this example — and only one of the subblocks can use these shared busses at a time. 
Accordingly, coprocessor 200 has been designed to make most efficient use of the shared busses 214. For example, 
the coprocessor 200 sub-blocks may buffer or "cache" information to minimize the frequency of different bus accesses 
by the same sub-block and to make the subblocks more tolerant of temporary bus unavailability. A private bus 218 
albws signal processor 400 to communicate with display processor 500 without having to wait for main bus 214 to 

io become available. 

Also as shown in Figure 6A, each of the sub-blocks of coprocessor 200 includes control/status registers that can 
be accessed by main processor 100 via CPU interface 202. For example, signal processor registers 407, display 
processor registers 507, audio interface registers 1207, video interface registers 907, serial interface registers 1307, 
parallel peripheral interface registers 206, RAM interface registers 1007a, and RAM controller registers 1007b are 
15 each mapped into the main processor 1 00 address space. The main processor 1 00 can read from and/or write to these 
various registers under control ol game program 108 to directly control the operation ot sub-blocks within coprocessor 
200. 

Signal Processor 400 

20 

Figure 7 shows the architecture of signal processor 400 of this example in more detail. As explained above, signal 
processor 400 includes a scalar unit 410, a vector unit 420, an instruction memory 402 and a data memory 404. In this 
example, scalar unit 410 is a 32-bit integer processor that executes a sub-set of the MIPS 4000 instruction set. Vector 
unit 420 (which is defined as a "CP1" coprocessor of scalar unit 410 under the MIPS 4000 architecture) performs 
25 integer calculations (e,g. , multiplications, additions, subtractions and multiply/accumulates) on eight 1 6-bit sets of val- 
ues in parallel. 

Vector unit 420 can perform the same operation on eight pairs of 16-bit operands in parallel simultaneously. This 
makes signal processor 400 especially suited for "sum of products" calculations such as those found in matrix multi- 
plications, texture resampling, and audio digital signal processing such as for example, digital audio synthesis and 

30 spatial and frequency filtering. 

Signal processor 400 uses a RISC (reduced instruction set computer) architecture to provide high performance 
machine control based on instructions residing in the instruction memory 402. In this example, execution unit includes 
a program counter 432 that is used to address instruction memory 402 over path 434. This program counter 432 can 
access only the 4 kilobyte instruction space within instruction memory 402 in this example— requiring that all instructions 

35 to be executed by the signal processor first be placed into the instruction memory. Execution unit 430 generates output 
control signals 436 based on the particular instructions currently being executed. These output control signals 436 
control all other parts of signal processor 400, and are sequenced to manage pipelined instruction processing. Scalar 
unit 410 and vector unit 420 are controlled by these control signals 436. For example, scalar unit 410 may address 
data memory 404 via path 438 to read data from and/or write data into the data memory using load/store block 440. 

40 Data path 41 4 may perform tests based on results of calculations and provide resulting condition outputs to execution 
unit 430 via path 442. This execution unit 430 may use these condition outputs to perform a conditional branch or jump, 
loading program counter 432 with the appropriate (next) address into instruction memory 402. Because scalar proc- 
essor 410 has these more general capabilities, it is used in this example for general purpose functions such as, for 
example, control flow, address calculation and the like-in addition to providing 32-bit integer calculations. 

45 Execution unit 430 executes intermediate, jump and register instruction formats in accordance with the standard 

MIPS R4000 instruction set. Figure 7A shows an example of a register instruction format 450 and how signal processor 
400 uses that register instruction format to access three 128-bit wide words 452 within data memory 404. Registef 
instruction formal 450 may include a 6-bit operation code field 450(a), a 5-bit source register specifier 450(b), a 5-bit 
target (source/destination) register specifier 450(c), a 5-bit destination register specifier 450(d), and a parameter field 

50 450(e), The parameter field 450(e) may specify shift amounts and/or functions, and together with operation code 450 
(a) defines the operation to be performed. Each of fields 450(b), 450(c) and 450(d) specifies a location within data 
memory 404--and thus each designates 128-bit word. 

As shown in Figure 7B, vector unit 420 treats each of these 128 bit words as a concatenated sequence of eight 
16-bit values, and operates on each of the 16-bit values in parallel. The operations of vector unit 420 are invoked by 

55 instructions within the CP1 type instructions typically reserved for floating point operations in the MIPS R4000 instruction 
set (signal processor 400 has no floating point unit in this example). 

Scalar unit 410 includes a register file 412 comprising 32 registers, each register being 32 bits wide Scalar unit 
also includes a data path 41 4 comprising adders, shifters, and other logic required to execute integer calculations and 
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other operations. Register file 41 2 is similar to the general purpose register file defined by the MIPS R4000 architecture, 
and accepts instructions in R4000 format. Data path 414 includes an integer multiplier/divider, and operates in con- 
junction with an execution unit 430 that receives 64-bit wide instructions from instruction memory 402. 

Vector unit 420 includes eight sets of register files 422(0)-422(7) and eight sets of corresponding data paths 423 
5 (0)-423(7). Data paths 423 each include a 1 6-bit multiplier, a 1 6-bit adder and a 48-bit accumulator (48 bit accumulation 
accommodates audio filters with a large number of taps, and also accommodates partial products wherein a series of 
16-bit multiplies and sums is used to obtain a 32 -bit result for certain -graphics calculations requiring more than 16-bit 
precision). Each of register files 422 comprises 32 registers each of which are 32-bits wide. A 128 bit wide data path 
444 connects vector unit 420 to load/store block 440, and another 128 bit wide data path 446 connects the load/store 
io block 440 to data memory 404. Data memory 404 stores 4096 (4KB) words, each word being 128 bits wide. When a 
word in data memory 404 is retrieved for use by vector unit 420, it is sliced into eight 16-bit segments, with each 
segment being sent to a different register file 422 within vector unit 420 (see Figure 7B). Figure 70 shows an example 
add operation performed by vector unit 420. When vector unit 420 writes to a destination addressed within data memory 
404, each of register files 422 contributes 16-bits which are combined into a 128 bit word before being written into the 
is data memory (see Figure 7A). Alternatively, load/store block 440 includes a steering multiplexer arrangement (not 
shown) that can steer 16-bit sub-words within the data memory 128-bit word to/from different vector unit register files 
422 -- with the particular sub-word and the particular vector unit register file being selectable based on instructions 
from instruction memory 402. Similarly, load/store block 440 includes a further steering multiplexer arrangement (not 
shown) that can steer different sized data units (e.g., bytes, 16-bit half-words, or 32-bit words) between data memory 

20 408 and scalar unit 41 0 - with the particular data unit and size being specified by instructions within instruction memory 
402. See, for example, description of load and Store "Byte", 'Halfword", "Word", "Word Left" and "Word Right" in Hein- 
rich, MIPS R4000 Microprocessor User's Manual (2d Ed. 1 994). 

Signal processor 400 also includes a DMA controller 454 and CPU control registers 456. DMA controller 454 is 
connected to the coprocessor internal bus 214, and is used to transfer data into and out of instruction memory 402 

25 and/or data memory 404. For example, DMA controller 454 can copy microcode modules 156 from main memory 300 
into signal processor instruction memory 402. DMA controller 454 may also be used to transfer information between 
data memory 404 and main memory 300. DMA controller 454 can be commanded by execution unit 430, and receives 
DMA address and data information from scalar unit data path 414 over path 438. DMA controller 454 may also be 
commanded by main processor 100 via CPU control registers 456 CPU control registers 456 am mapped into the 

30 main processor 100 address space, and can be accessed by signal processor 400 and execution unit 430 using MIPS 
"CP0 M instruction formats. 

Figures 7D-7L show example CPU control registers 756. The registers shown in Figures 7D-7H are used to control 
and/or monitor the DMA controller 454. 

For example, the SP-DRAM DMA address register 458 shown in Figure 7D can be written to or read from by main 
35 processor 100 (as well as SP execution unit 430), and is used to specify a starting DMA address within instruction 
memory 402 or data memory 404. SP memory DMA address 460 shown in Figure 7E is used to specify a starting DMA 
address in main memory 300. Read and write DMA length registers 462, 464 shown in Figure 7F and 7G, respectively, 
specify the length of a block of data to be transferred between signal processor 400 and main memory 300-with the 
direction of transfer depending upon which one of these two registers is used to specify the block length. DMA status 
40 registers 466, 468 shown in Figures 7H and 71 respectively, can be read by main processor 100 to determine whether 
DMA controller 454 is full or busy, respectively. 

Figure 7 J shows the main SP status register 470 within CPU control registers 456. SP status register 470 acts as 
an SP control register when it is written to by main processor 100 (top diagram of Figure 7 J), and indicates SP status 
when read by the main processor (bottom diagram in Figure 7J). When used as a status register, SP status register 
45 470 tells main processor 100 whether the SP is halted (field 471), whether the SP is operating in a breakpoint mode 
(field 472), whether the DMA controller 454 is busy (field 474) or full (field 475), whether SP I/O is full (field 476), 
whether the SP is operating in single step mode (field 477), whether the SP is operating in a mode in which it won't 
generate an interrupt upon reaching a breakpoint (block 47S), and whether the SP has generated various general 
purpose "signals' 479 that can be defined under software control to provide status concerning various software-de- 
so pendent parameters. Main processor 100 can write to register 470 to stop or start signal processor 400 (fields 480, 
481), to clear breakpoint mode (field 482), to clear or set an interrupt mode (fields 4B3, 4B4), to clear or set single step 
mode (fields 485, 486), to clear or set an interrupt on breakpoint mode (fields 487, 488), and to clear or set the various 
software-dependent "signals" (fields 489, 490). 

Figure 7K shows an additional SP register 491 used as a "semaphore" for general purpose communications bo- 
55 tween the main processor 100 and the signal processor 400. This register 491 contains a flag that main processor 100 
sets upon reading the register and clears upon writing to the register Signal processor 400 can also set or clear this flag. 

Figure 7L shows an SP instruction memory BIST status register 492 that is used as a BIST control register when 
written toby main processor 100 (top diagram in Figure 7L)and indicates BIST status when read by the main processor 
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(bottom diagram of Figure 7L). Program counter 432 is preferably also mapped into the CPU control registers 456 so 
that it can be written to and read from by main processor 100 

Signal Processor Microcode 

5 

The particular functions signal processor 400 performs depend on the SP microcode 156 provided by storage 
device 54. In this example, SP microcode 156 provides both graphics and audio processing functions. As explained 
above, the main tasks performed by signal processor 400 for graphics processing include reading a display list, per- 
forming 3-dimensional geometry transformation and lighting calculations, and generating corresponding graphics dis- 
10 play commands for use by display processor 500. In more detail, signal processor 400 performs the following overall 
graphics functions under control of microcode 156: 

• Display list processing 

• Matrix definition 

is • Vertex generation and lighting 

• Texture definition/loading 

• Clipping and culling 

• ' Display processor command setup 

• Flow control 

20 

Signal processor 400 performs the following overall functions under control of microcode 156 to process audio; 

• Play list processing 

• Digital audio synthesis/processing 

25 • Writing digital audio samples to main memory audio buffer 114 

Task Lists 

Main processor 100 tells signal processor 400 what to do by providing the signal processor with a task list. The 
30 microcode 156 program that runs on signal processor 400 is called a task. Main processor 100 (and thus the video 
game program 108 supplied by storage device 54) is responsible for scheduling and invoking tasks on signal processor 
400. The task list contains all of the information signal processor 400 needs to begin task execution, including pointers 
to the microcode 156 routines it needs to run in order to perform tasks. Main processor 100 provides this task list under 
control by game program 108. 

35 Figure 8 shows an example of a task list 250. The task list 250 may reference one or more display lists and/or play 

lists 110. These display lists or play lists 110, in turn, may reference additional data structures including other display 
lists or play lists. A display list 1 1 0 can point to other display lists and/or graphics data. Similarly, a play list can reference 
other play list and/or sound data. In this example, display lists and play lists can be thought of as hierarchical data 
structures up to ten levels deep. Signal processor 400 processes the display lists and play lists of the stack, pushing 

40 and popping the current display list pointer. All display lists must terminate with an "end' command. For example, 
display list 110(1) shown in Figure 8 references another display list 110(2). Display list 110(2) references graphics data 
112 needed to execute the list. Similarly, play list 110(4) shown in Figure 8 references sound data 112B. 

For graphics animation, it is desirable to "double buffer" only parts of the display list 110 that change from one 
frame to another. In this way, only the data that changes from one frame to the next needs to be "double buffered"- 

45 thus conserving space in main memory 300. Swapping between double buffers is efficiently done by changing segment 
base addresses within task lists 250 and by organizing the hierarchical display lists in an appropriately efficient manner. 
Display lists or fragments of display lists can be chained together for more efficient memory utilization. 

Figure 9 shows an example process performed by main processor 100 to invoke processing of a new task list by 
signal processor 400. Main processor 1 00 first loads the task (display) list into main memory 300 (block 601 ). It then 

50 halts signal processor 400 (or checks to insure that the signal processor is hatted) by writing to and/or reading from 
SP status register 470 (block 602). Main processor 1 00 then writes to SP DMA registers 458, 460, 462 to load an initial 
microcode module into signal processor instruction memory 402 (604, Figure 9). Main processor 100 next stores the 
address in main memory 300 of the task (display) list loaded by block 601 into signal processor data memory 404 
(block 606, Figure 9). Main processor 100 then resets the signal processor program counter 432 (block 608, Figure 

55 9) ( and writes to SP status register 470 to start the signal processor 400 (block 610, Figure 9). The signal processor 
400 typically then uses its DMA controller 454 to fetch the task (display) list from main memory 300 into its data memory 
404. 

Now that signal processor 400 has a task list and is started, it proceeds to perform each of the operations requested 
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in the task list It continues to execute the task list until it reaches the end of the task list, at which time it stops and 
waits for main processor 100 to provide a new task list Generally, main processor 100 provides a new task list once 
each video frame--afthough, as discussed above, in many cases only a portion of the task list and/or the display and/ 
or play lists the task list references may actually change from one frame to the next Portions of the task list in main 
5 memory 300 may be "double buffered" so the main processor 100 can be writing to one buffer while signal processor 
400 reads from another buffer. Before the next video frame, the main processor 100 can change a pointer to give the 
signal processor 400 access to the new buffer. 

As signal processor 400 executes the task list, it retrieves additional SP microcode 156 modules from main memory 
300 as needed to perform the specified tasks. For example, signal processor 400 may use its DMA facility 454 to load 
io particular graphics microcode into instruction memory 402 to execute graphics commands specified by a task list, and 
may similarly retrieve and load audio processing microcode routines to perform audio processing specified by the task 
list. Different microcode routines or "overlays" may be loaded on an as-needed basis to more optimally handle particular 
types of graphics and/or audio processing operations. As one example, the signal processor 400 may load special 
lighting graphics routines as overlays to perform particular lighting operations, and may load clipping routines or over- 
*5 (ays to perform particular culling operations. Microcode loading and reloading into signal processor 400 during execu- 
tion of the single task list 250 is necessary in this example because signal processor instruction memory 402 is not 
large enough to store all of SP microcode 156, and the signal processor is designed so that it can execute instructions 
only out of its internal instruction memory. 

Figure 10 shows an example of a simplified graphics process performed by signal processor 400 based on a 
20 display list 110. In this simplified process, the display list 110 first commands signal processor 400 to set various 
attributes defining the overall graphical images that are to be rendered by the co-processor. Such attributes include, 
for example, shading, lighting, Z buffering, texture generation, fogging and culling (Figure 10 block 612). The display 
list next commands signal processor 400 to define a modeling/ viewing matrix and a projection matrix (Figure 10, block 
614). Once the appropriate matrices have been defined, the display list commands signal processor 40O to transform 
2S a set of vertices based on the modeling/viewing matrix and the projection matrix defined by block 614 and also based 
on the attributes set by block 612 (Figure 10, block 616). Finally, the display list commands signal processor 400 to 
generate a graphics display (e.g., triangle) command that directs display processor 500 to render a primitive based on 
the vertices generated by block 616 and the attributes set by block 612 (Figure 10, block 618). Signal processor 400 
may, in response to step 618, transfer the display processor command it has generated (or the address of the command, 
30 which the signal processor may store in its data memory 404 or in main memory 300) for access and execution by 
display processor 500. 

Figure 11 shows an overall process 620 performed by signal processor graphics microcode 156 to process a 
display list 110 (e.g . to perform the type of process shown in Figure 10). Signal processor 400 gets the next display 
list command and determines what kind of a command it is (Figure 11, block 622). Display lists commands in this 
3S example generally have five different types: 

• Signal processor attribute command 

• Display processor command 

• Matrix command 
40 • vertex command 

• Triangle command 

• Flow control command 

If the display list command is a signal processor attribute command, signal processor 400 sets signal processor 
*s attributes as specified by the command (Figure 11, block 624). In this example, the following types of SP attribute 
command are defined: 

• shading 

• lighting 

so • Z-buffering 

• texturing 

• fogging 

• culling, 

55 The following are example SP attribute command formats and associated definitions: 
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SIGNAL PROCESSOR ATTRIBUTE COMMANDS: 



G SETGEQMETRYMODE: 



command 






command 9 



10 This command "sets" some of the rendering pipeline state. This state is maintained in the signal processor 400, 

and a SET/CLEAR interface is presented to the user. 

Bits which are "on" in the command field are turned ON in the internal state. 



75 



G_SHADE 

G_LIGHTING 
G_SHADING_SMOOTH 
G-ZBUFFER 
G_TEXTURE_G EN 

G__FOG 

30 G_TEXTURE_GEN_LINEAR 



20 



25 



35 



40 



45 



G_LOD 

G__CULL_FRONT 

G_CULL_BACK 

G CLEARGEOMETRY MODE: 



Enable vertex shading or use primitive color to paint the polygon (default is vertex 
shading). 

Enable lighting calculations. 

Enable smooth or flat shading (the default, with this bit cleared is flat shading). 
Enable z-buffer depth calculations. 

Enable automatic generation of the texture coordinates S & T After transformations, 
a spherical mapping will be used to replace any S &. T value originally given with the 
vertex. 

Enable fog coefficient to be generated and replace the vertex alpha. Large alphas 
are more foggy (farther). 

Enable linearization of the texture coordinates generated when G_TEXTURE_GEN 
is set. For example, this allows the use of a panoramic texture map when performing 
environment mapping. 

Enable generation level of detail (LOD) value for mipmapped textures and texture- 
edge mode. 

Cull the front-facing polygons. 
Cull the back-facing polygons. 



Same as G_SETGEOMETRYMODE, but this command "clears" some of the rendering pipeline state (bits which 
are "on" in the command field are turned OFF in the internal state). 



so 



55 
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GJLIGHT: 



Command 


param 


1 ngth-16 




seg 


address 



i 



lightr 


ligbt.g 


light.b 


0x00 


ligfat.r 


light.g 


light.b 


0x00 


light.x 


lighty 


lightjs 


0x00 



This command passes a light to the rendering pipeline. There can be up to 7 directional lights (numbered 1 -7) plus 
an ambient light. The param specifies which light number (n) to replace with this light description. Use the 
G_NUM JJGHTS command to specify how many of the 8 lights to use. (f the number of lights specified is N, then the 
first Nl lights (1-N) will be the ones used, and the Nth+1 lights will be the ambient light. The "param" field should be set 
based on a value maintained in data memory 404 + (n-t) x 2. 

The ambient light is defined by a color: light, r, light. g, light.b (unsigned 8 bit integers) which should be set to the 
color of the ambient light multiplied by the color of the object which is to be drawn (If you are lighting a texture mapped 
object just use the color of the ambient light). (For ambient lights the light.x, light.y, and light. z fields are ignored). The 
ambient light cannot be turned off except by specifying a color of black in this example. 

Directional lights are specified by a color: light. r, light. g, light.b (unsigned 8 bit integers) which, like the ambient 
light color, should be set to the color of the light source multiplied times the color of the object which is to be drawn. 
Directional lights also have a direction. The light.x, light.y, light. z fields (signed 8 bit fractions with 7 bits of fraction) 
indicates the direction from the object to light There must be at least one directional light (if G_UGHTING is enabled 
in G_SETGEOMETRYMODE command) turned on, but if its color is black it will have no effect on the scene. 

The G_JslUMJJGHTS command should always be used sometime after GJJGHT command(s) before the next 
G_VTX command even if the number of lights has not changed. 



G NUM LIGHTS: 



Command 


param 


length=8 




seg 


address 



i 



0x8000 


32x(l+N) 


0x00000000 



N = number of diffuse light sources (1-7). 

This command specifies how many lights should be used. It should always be used after the GJJGHT command 
before the next G_VTX command. The parameter specifies the number of diffuse light sources (N) which must be at 
least 1 and not more than 7. The ambient light source will be light number N+1 and the directional light sources will be 
lights numbered 1 through N. 



18 



EP 0 778 536 A2 



10 



G_SETOTHERMODE_H: 



command 



shift 



lcn 



word 



40- 



(00*7014.01) 



15 



This command sets the high word of the "other* modes in the display processor, including blending, texturing, and 
frame buffer parameters. The signal processor 400 remembers the high and low words of the display processor 500 
"other" state t in order to present a simple set-command interface Although this is a display processor command, it 
must be parsed and interpreted by the signal processor 400 and therefore cannot be sent directly to the display proc- 
essor without first going through the stgnal processor. 

The shift and len parameters in this command are used to construct a mask: 



20 



(((0x01 « len) shift) 



This mask is used to clear those bits in the display processor 500 status word. New bits, from the word parameter are 
OR'd into the status word, (the parameter word must be pre-shifted). 

25 G_SETOTHERMODE_L 

Same as G_SETOTHERMODE_H, but affects the low word of the "other" modes on the display processor 500 
G TEXTURE: 



30 



35 



command 



s scale 



t scale 



mipmap tile 
Hvcl num 



on 



40 



This command turns texture mapping ON/OFF, provides texture coordinate scaling, and selects the tile number 
(within a tiled texture). Scale parameters are in the format of (.16) and scale the texture parameters in vertex commands. 
Texture on/off turns on and off the texture coordinate processing in the geometry pipeline. Tile number corresponds to 
tiles chosen in the raster portion of the pipeline. The tile num also holds the maximum levels for level of detail (LOD) 
(mid-mapping). 



G LOOKAT X: 



45 


Command 


param 


lcngth-16 






scg 


address 


SO 


i 











55 



0x00000000 



0x00000000 



0x00 



This command is used for automatic texture coordinate generation. It is used to describe the orientation of the eye 
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so that the signal processor 400 knows with respect to what to generate texture coordinates The XYZ values (8 bit 
signed fractions with 7 bits of fraction) describe a vector in woridspace (the space between the MODELVIEW matrix 
and the PROJECTION matrix) which is perpendicular to the viewer's viewing direction and pointing towards the viewer's 
right 

5 

G_LOOKAT_Y: 

Same as G_LOOKAT_X, but the first zero words in the addressed segment are zero (0x00000000). 

io DP Command Generation 

Referring back to Figure 11 : if the next display list command is one intended for display processor 500, signal 
processor 400 simply writes the command to the display processor (block 626 of Figure 11 ). Block 626 can either DMA 
the display processor command into display processor 500 via the X-bus 218, or it can deposit the display processor 
is command in a buffer within main memory 300 for access by the display processor. 

MATRIX COMMANDS 



20 



2B 



30 



35 



40 



If the next display list command is a matrix command, signal processor 400 updates the state of the current matrix 
it is using (Figure 11 , block 628) and places the updated matrix on the matrix stack (block 630). As mentioned above, 
in this example signal processor 400 maintains a 10-deep modeling/Viewing matrix stack. New matrices can be loaded 
onto the stack, multiplied (concatenated) with the top of the stack, or popped off of the stack. In this example, signal 
processor 400 maintains a "one-deep" projection matrix. Therefore, new matrices can be loaded onto or multiplied with 
the current projection matrix, but cannot be pushed or popped. 

In this example, the modeling/viewing matrix stack resides in main memory 300. The video game program 108 
must allocate enough memory for this stack and provide a pointer to the stack area in task list 250. The format of the 
matrix is optimized for the signal processor's vector unit 420. To provide adequate resolution, signal processor 400 in 
this example represents each matrix value in 32 -bit "double precision"-with an upper 16 bit signed integer portion 
(indicating the part of the value greater than 1) and a lower 16-bit fractional portion (indicating the part of the value 
between 0 and 1 ) However, vector unit 420 in this example operates on 16-bit wide values and cannot directly multiply 
32-bit wide values. The matrix format (which is shown in Figure 12B) groups all of the integer parts of the elements, 
followed by all of the fractional parts of the elements It allows signal processor 400 to more efficiently manipulate the 
matrix by multiplying 16 bit integer parts and 16 bit fractional parts separately without have to repeatedly 'unpack" or 
■pack" the matrix. 

For example : vector unit 420 can multiply each of the 16-bit fixed point signed integer values in a matrix row in 
one operation, it can multiply each of the 16-bit fractional portions of the same row in another operation. These two 
partial results can be added together to obtain a 32 -bit double precision value, or they can be used separately (e.g., 
for operations that require only the integer part of the result or only the fractional part of the result). Thus, matrix 
representations thus allows signal processor 400 to efficiently process 32-bit precision values even though vector unit 
420 in this example, operates on 16-bit values and as no explicit "double precision" capability. 

The following are example signal processor matrix commands and associated formats: 

Example Matrix Commands: 



45 



SO 



Command 


param 


length 






address 
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raOOint 


mOOfrac 


ralO int 


mlO frac 
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10 



20 



25 



30 



The matrix command points to a 4x4 transformation matrix (See Figure 12B) that will be used to transform the subse- 
quent geometry, in a manner controlled by the flags in the parameter field. The length is the size of the incoming matrix 
in bytes. A 4x4 matrix pointed toby this command has the following format: It is a contiguous block of memory, containing 
the 16 elements of the matrix in ROW MAJOR order. Each element of the matrix is in a fixed point format, S15.16 The 
length of a 4 x 4 matrix in bytes should be 64 bytes. The segment id and address field are used to construct the main 
memory 300 address of the actual matrix, (see G_SEGMENT SP command for more information). 
The following flags in the parameter field are used: 



G_MTX_MODELVIEW 

G_MTX_PROJECTlON 

G_MTX_MU(_ 
G_MTX_LOAD 



Identifies the incoming matrix as a modelview matrix, which is necessary to provide effi- 
cient transformation of polygon normals for shading, etc. (default) 

Identifies the incoming matrix as a projection matrix, which does not affect the transfor- 
mation of the polygon normals for shading, etc. 

The incoming matrix is concatenated with the current top of the matrix stack, (default) 

The incoming matrix replaces the current top ol the (modelview or projection) matrix stack. 

The current top of the matrix stack is not pushed prior to performing the load or concat 
operation with the top of the stack, (default) 

The current top of the matrix stack is pushed prior to performing the load or concat oper- 
ation with the top of the stack. Push is only supported with 
G_MTX_MODELVIEW, and not with 

G_MTX__PROJECTlON. -Since there is no projection matrix stack (the projection must 
be explicitly reloaded) 

This single command with the combination of parameters allows for a variety of commonly used matrix operations. 
For example, (G_MTX_LOADI G_MTX_NOPUSH) replaces the top of the stack (G_MTX_MULI G_MTX_PUSH) per- 
forms a concatenation while pushing the stack for typical modeling hierarchy construction. 

For lighting and texturing, the polygon normal also must be transformed by the inverse transpose of the modelview 
matrix (reference the "OpenGL Programming Guide"). This is the reason separate modelview and projection stacks 
are kept, and incoming matrices must be identified 



G MTX NOPUSH 



G MTX PUSH 



35 



40 



3J 



command 






pararn 



in. 



This command pops the modelview matrix stack. The parameter field should be 0. Popping an empty stack results 
(doesn't pop). Since there is no projection matrix stack, this command is supported only for the modelview matrix. 



45 



so 



55 



XID: <EP 077B536A2_I_> 



21 



EP 0 778 536 A2 



Gr VIEWjPOttT; 



Command 


param 


Icngth=16 




seg 


address 



1 



x scale 


y scale 


z scale 


pad 


x translate 


y translate 


z translate 


pad 



This command sends a viewport structure to the graphics pipeline. 

The segment id and address field are used to construct the main memory 300 address of the actual VIEWPORT 
structure (see G_SEGMENT for more information). 

The viewport transtormation is a scale-translation of the normalized screen coordinates. In general, the viewport 
must be constructed in cooperation with the projection matrix in order to meet the hardware requirements for screen 
device coordinates. 

The scale and translation terms for x and y have 2 bits of fraction, necessary to accommodate the sub-pixel posi- 
tioning in the hardware. The z values have no fraction. 

Accounting for the fractional bits, using one of the default projection matrices, the viewport structure can be ini- 
tialized like this: 

(SCREEN. WD/2M, (SCREEN JHT/2)* 4, G.MAXZ, 0, /* scale V 
(SCREEN_WD/2 # 4,(SCREEN_HT/2)*4,0,0,/* translate V 

Vertex Command Processing 

Referring once again to Figure 11, if the next display list command is a ."vertex command", signal processor 400 
transforms the vertices specified by the vertex command by the current matrix state and possibly shaded by the current 
lighting state, performs a clip test on the vertices, and loads the resulting vertices into a vertex buffer 408 within data 
memory 404. Signal processor 400 in this example has a vertex buffer that holds up to sixteen vertices. Figure 13A 
shows the signal processor 400 vertex buffer, which is fully exposed to main processor 100 and thus to video game 
program 1 08. This internal vertex buffer 404, which can hold up to 16 points, is stored in signal processor data memory 
404 and can be read by main processor 100. 

Although signal processor 400 in this example, can handle only lines, triangles or rectangles (i.e. , surfaces defined 
by 2, 3, or 4 vertices), vertex buffer 408 in this example, stores up to 1 6 vertices so that the signal processor can reuse 
transformed vertex values instead of having to recalculate the vertices each time. 3D authoring/modeling software 
used to create video game program 108, in this example, should preferably organize display list 110 to maximize vertex 
reuse (and thus speed performance). 

Figure 1 3B shows an example vertex data structure signal processor 400 uses to represent each of the vertices 
stored in vertex buffer 408. In this example, the transformed x, y, z, and w, values corresponding to the vertex are 
stored in double precision format, with the integer parts first followed by the fractional parts (fields 408 (1)(a)-408 (1 ) 
(h)). With vertex color (r, g, b, a) are stored in fields 408(1 )(i)-408(1)(1), and vertex texture coordinates (s, t) are stored 
in fields 408(1 )(m), 408(1 )(n). Additionally, from this example, the vertex values in screen space coordinates (i.e., 
transformed and projected onto the viewing plane) are stored in fields 408(1 )(o)-408(1)(t) (with the one/w value stored 
in double precision format). The screen coordinates are used by display processor 500 to draw polygons defined by 
the vertex. The transformed 3-dimensional coordinates are maintained in vertex buffer 408 for a clipping test. Since 
polygons (not vertices) are clipped, and since the vertices in vertex buffer 408 may be re-used for multiple polygons, 
these transformed 3D vertex values are stored for multiple possible clipping test to be performed. In addition, the vertex 
data structure 408(1) includes flags 408(1 )(v) that signal processor 400 can use. for example, to specify clip test results 
(i.e., whether the vertex falls inside or outside of each of six different clip planes). The perspective projection factor 
stored in fields 408(1 )(s), 408(1 )(t) is retained for perspective correction operations performed by the display processor 
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texture coordinate unit (explain below). 

The following is an example of a vertex command format used to load the internal vertex buffer with some points: 



Command 


D 


vO 


length 




scg 


address 



10 



IS 



20 



X 


y 


z 


flag 


8 


t 


rornx 


g or ay 


b or nz 


a 



• 



25 



This command loads (n+1 ) points into the vector buffer beginning at location vO in the vertex buffer. The segment 
id and address field are used to construct the main memory 300 address of the actual VTX structure, (see G__SEGMENT 
for more information). The number of vertices n, is encoded as "the number minus one", in order to allow a full 16 
30 vertices to be represented in 4 bits. The length is the number of points times 1 6, the size of the VTX structure (in bytes). 
Vertex coordinates are 16-bit integers, the texture coordinates s and t are S10.5. The flag parameter is ignored in this 
example A vertex either has a color or a normal (for shading). Colors are 8 bit unsigned numbers. Normals are 8 bit 
signed fractions (7 bits of fraction). (0x7f maps to +1 .0 : 0x81 maps to -1 .0, and 0x0 maps to 0.0). Normal vectors must 
be normalized, i.e., 

35 

Vx 2 +y 2 +z 2 < 127 

Upon receiving a vertex command, signal processor 400 transforms the vertices specified in the vertex command 
40 using the current modeling/viewing matrix (Figure 11, block 632). See Neider et al, Open GL Programming Guide 
(Silicon Graphics 1993) at chapter 3 ("viewing"). These transformation orient the object represented by the vertices in 
.3-dimensional space relative to the selected view point' For example, they may translate, rotate and/or scale the rep- 
resented object relative to a selected point of view. Such transformation calculations make heavy use of the signal 
processor vector unit 420 and its ability to perform eight parallel calculations simultaneously. The transformed results 
4$ are stored in vertex data structure fields 408(1 )(a)-408(1)(h) in double precision format in this example. 

CllpT st : 

Signal processor 400 then performs a clip test (Figure 11 , block 636) to determine whether the transformed vertex 
so is inside or outside of the scene. Six clipping planes define the sides and ends of the viewing volume. Bach transformed 
vertex is compared to each of these six planes, and the results of the comparison (i.e., on which side of the clip plane 
the vertex is located) are stored in vertex buffer 'flags' field 408(v) (see Figure 13B). These results are used by clipping 
block 646 in response to a "triangle command" (see below). Note that because this example clips polygons and not 
vertices Figure 1 1 block 636 does not actually perform clipping, it simply tests vertex position relative to the clip planes. 

ss 

Projection : 

Signal processor 400 then transforms the vertex values using the projection matrix (Figure 11, block 638) The 



23 

OCID <EP 077B536A2J._> 



EP 0 778 536 A2 



purpose of the projection transformation is to define a viewing volume, which is used in two ways. The viewing volume 
determines how an object is projected onto the 2-dimensional viewing screen (that is, by using a perspective or an 
orthographic projection). (See Open GL Programming Guide at 90 et seq.) The resulting transformed vertices have 
now been projected from 3-dimensiona! space onto the 2-dimensional viewing plane with the proper for shortening (if 
5 the projection matrix defines a perspective projection) or orthographically (if the projection matrix defines an ortho- 
graphic projection). These screen coordinates values are also written to the vertex buffer data structure at fields 408 
(1 )(o)-408(l )t) <the"l/w' value is retained for later perspective correction). 

Lighting : 

10 

Signal processor 400 next performs lighting calculations in order to "light" each of the vertices specified in the 
vertex command. System 50 supports a number of sophisticated real-time lighting effects, including ambient (uniform) 
lighting, diffuse (directional) lights, and specular highlights (using texture mapping). In order to perform lighting calcu- 
lations in this example, signal processor 400 must first load an SP microcode 108 overlay to perform the lighting 
'5 calculations. The G_SETGEOMETRYMODE command must have specified that lighting calculations are enabled, and 
the lights must have been defined by the G_NUM_LIGHTS command discussed above. The part of microcode 108 
that performs the lighting calculations is not normally resident within signal processor 400, but is brought in through 
an overlay when lighting calls are made. This has performance implications tor rendering scenes with some objects 
lighted and others colored statically. In this example, the lighting overlay overwrites the clipping microcode, so to achieve 
20 highest performance it is best to minimize or completely avoid clipped objects in lighted scenes. 

To light an object, the vertices which make up the objects must have normals instead of colors specified. In this 
example, the normal consists of three signed 8-bit numbers representing the x, y and z components of the normal (see 
the G_VTX command format described above). Each component ranges in value from -128 to +127 in this example. 
The x component goes in the position of the red color of the vertex, the y into the green and the z into the blue. Alpha 
25 remains unchanged. The normal vector must be normalized, as discussed above 

Lighting can help achieve the effect of depth by altering the way objects appear as they change-their orientation. 
Signal processor 4O0 in this example supports up to seven diffused lights in a scene. Bach light has a direction and a 
color Regardless of the orientation of the object and the viewer, each light will continue to shine in the same direction 
(relative to the open 'world") until the light direction is changed In addition, one ambient light provides uniform illumi- 
30 nation. Shadows are not explicitly supported by signal processor 400 in this example. 

As explained above, lighting information is passed to signal processor 400 in light data structures. The number of 
diffuse lights can vary from 0 to 7. Variables with red, green and blue values represent the color of the light and take 
on values ranging from 0 to 255. The variables with the x, y, z suffixes represent the direction of the light The convention 
is that the direction points toward the light. This means the light direction indicates the direction to the light and not the 
35 direction that the light is shining (for example, if the light is coming from the upper left of the world. the direction might 
be x = -141 : y = -141 ,z = 0). To avoid any ambient light, the programmer must specify the ambient light is black (0, 0, 0,). 

The GJight command is used to activate a set of lights on a display list. Once lights are activated, they remain on 
until the next set of lights is activated. This implies that setting up a new structure of lights overwrites the old structure 
of lights in signal processor 400. To turn on the lighting computation so that the lights can take effect, the lighting mode 
40 bit needs to be turned on using the G_SETGEOMETRYMODE command. 

The lighting structures discussed above are used to provide color values for storing into vertex buffer fields 408 
(1)(i)-40B(1)(1). 

Texture Coordinate Scaling/Creation : 

45 

Signal processor 400 next performs texture coordinate scaling and/or creation (Figure 11, block 642). In this ex- 
ample, the operations performed by block 642 may be used to accomplish specular highlighting, reflection mapping 
and environment mapping To render these effects, coprocessor 200 in this example uses a texture map of an image 
of the light or environment, and computes the texture coordinates s,t based on the angle from the viewpoint to the 
bo surface normal. This texture mapping technique avoids the need to calculate surface normals at each pixel to accom- 
plish specular lighting. It would be too computationally intensive for system 50 in this example to perform such surface 
normal calculations at each pixel. 

The specular highlight from most lights can be represented by a texture map defining a round dot with an expo- 
nential or Gaussian function representing the intensity distribution. If the scene contains highlights from other, oddly 
55 shaped lights such as fluorescent tubes or glowing swords, the difficulty in rendering is no greater provided a texture 
map of the highlight can be obtained. 

Although display processor 500 performs texture mapping operations in this example, signal processor 400 per- 
forms texture coordinate transformations for each vertex when these effects are required. Activation or de-activation 
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of the signal processor texture coordinate transformations is specified by a value within the G_SETGEOMETRYMODE 
Command (see above) In addition, the G_SETGEOMETRYMODE Command can specify linearization of the generated 
textured coordinates, e.g., to allow use of a panoramic texture map when performing environment mapping. 

In this example, signal processor 400 texture coordinate generation utilizes the projection of the vertex normals 
s in the x and y directions in screen space to derive the s and t indices respectively for referencing the texture. The angle 
between the viewpoint and the surface normal at each vertex is used to generate s, t. The normal projections are scaled 
to obtain the actual s and t values in this example. Signal processor 400 may map the vertices "behind" the point of 
view into 0, and may map positive projections into a scaled value. 

In this example, texturing is activated using the G_TEXTURE command described above in the signal processor 
io attribute command section. This command provides, among other things, scaling values for performing the texture 
coordinate mapping described above. 

As explained above, the texture coordinate mapping perlormed by signal processor 400, in this example, also 
requires information specifying the orientation of the eye so that the angle between the vertex surface normal and the 
eye can be computed. The G_LOOKAT_X and the G_LOOKAT_Y commands supply the eye orientation for automatic 
is texture coordinate generation performed by signal processor 400. The transformed texture coordinate values, if they 
are calculated, are stored by signal processor 400 in the vertex data structure at fields 408(1 )(m), 408(1 )(n). These 
texture coordinate values are provided to display processor 500 to perform acquired texture mapping using a texture 
specified by the G_TEXTURE command. 

Since these effects use texture mapping, they cannot be used with objects which are otherwise texture mapped.. 

20 

Vertex Buffer Write : 

After performing all of these various steps, signal processor 400 writes the transformed, lighted, projected vertex 
values into vertex buffer 408 (Figure 11, block 644), and returns to parse the next display list command (block 622). 

25 

Triangle Command Processing : 

Once signal processor 400 has written vertices into its vertex buffer 408, the display list 1 10 can provide a 'triangle 
command". The "triangle command," which specifies a polygon defined by vertices in vertex buffer 408, is essentially 

30 a request for signal processor 400 to generate a graphics display command representing a polygon and to send that 
command to display processor 500 for rendering. In this example, signal processor 400 can render three different kinds 
of primitives: lines, triangles and rectangles. Different modules of microcode 108 need to be loaded in this example to 
render lines or triangles. In this example, all rectangles are 2 -dimensional primitives specified in screen<:oordinates, 
and are neither clipped nor scissored. 

35 The following is an example of a format and associated function of triangle commands: 

Example of Triangle Commands 

The following command specifies a triangle defined by 3 vertices in the vertex buffer: 

40 



G-TRI1: 


command 






N 


v0 


V1 


v2 



AS 

This command results in one triangle, using the vertices vO, v1, and v2 stored in the internal vertex buffer. The N 
field identifies which of the three vertices contains the normal of the face (for flat shading) or the color of the face (for 
flat shading). 

The following command is used to control signal processor 400 to generate display processor 500 commands for 
60 rendering a line defined by two vertices in vertex butter 408: 

G_LINE3D: 





G-LINE3D 




command 
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(continued) 



s 



G-L1NE3D 




N 


vO 


v1 





This command generates one line, using the vertices vO and v1 in the internal vertex buffer. The Nl field specifies- 
which of the two vertices contain the color of the face (for flat shading). 

Textured and filled rectangles require intervention by signal processor 400 and are thus a signal processor oper- 
ation. The following is an example command format and associated function of a texture rectangle command: 

G_TEXRECT 



is 1 I x1 | y1 

This command draws a 2D rectangle in the current fill color. The parameters xO, yO specify ihe upper left corner 
of the rectangle; X1 , yi are the lower right comers. All coordinates are 12 bits. 

20 Clipping/Setup : 

Referring back to Figure 11, upon receipt of a triangle command, signal processor 400 performs any necessary 
clipping of the vertices (Figure 11, block 646). This clipping operation eliminates portions of geometric primitives that 
lie outside of the six clipped planes defining the view plane 
25 As explained above, the results of the clip test 636 performed for each vertex are stored and available in vertex 

buffer 408. With the triangle command now defining a primitive defined by those vertices, signal processor 400 can 
proceed to clip the primitive. If all of the vertices of a primitive lay within the space defined by the six clip planes, the 
entire primitive exists within the display space and does not need to be clipped. If all of the vertices defining a primitive 
lay outside of the same clip plane (as indicated by the flags field of vertex data structure 400(1 ) shown in Figure 1 3B), 
30 the entire primitive can be excluded from display and thus discarded If some of the vertices defining a primitive lie 
within the display space and some lay outside of it (or if all vertices lay outside of the display space but define a primitive 
which passes through the displayed space), the primitive needs to be clipped and new vertices defined. These tests 
and operations are performed by clipping block 646 in this example. 

Signal processor 400 next performs backface culling (Figure 11, block 647). This operation maximizes drawing 
35 speed by discarding polygons that can be determined to be on the backface of an object and thus hidden from view. 
In this example, either front-facing, back-facing, neither or both types of primitives can be culled (i.e., discarded) by 
block 647. The types of primitives to cull are specified by parameters in the G_SETGEOMETRYMODE command 
described above --a I lowing geometry to be ordered in any direction or where used with different culling flags to achieve 
various effects (e.g.. interior surfaces, two-sided polygons, etc.). 
40 Signal processor 400 also performs some set up operations (Figure 11 , block 648), and may then pass a graphics 

display command to display processor 500 to control the display processor to render the primitive (Figure 11 , block 
650). Aspart of the set up operation (block 648), signal processor 400 in this example translates "segmented" addresses 
in the display list 1 1 0 into physical addresses that the display processor 500 can use (the display processor is a physical 
address machine in this example). 
45 in this example, signal processor 400 uses a segment table 416 (see Figure 13C) to assist it in addressing main 

memory 300. More specifically, addresses within signal processor 400 may be represented by a table entry 41 7 A and 
a 26-bit offset 41 7B. The table entry 41 7A references one of 16 base addresses within segment address table 416. 
The referenced base address may be added to the offset 41 7b to generate a physical address into main memory 300. 
Signal processor 400 constructs a main memory 300 address by adding the base address for the segment and a 26-bit 
so offset (which could be provided, for example, by a display list 110). The segment table 416 is constructed based on 
the following example G_SEGMENT command; 
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G SEGMENT 



command 








address 



This command adds an entry in the segment table 416 discussed above. 

10 The segmented addressing used by signal processor 400 in this example can be useful to facilitate double-buffered 

animation. For example, video game program 108 can keep two copies of certain display list fragments within main 
memory 300, with the same offsets in two different segments. Switching copies of them is as easy as swapping the 
segment pointers in signal processor 400. Another use is to group data and textures in one segment and to group 
static background geometry in another segment. Grouping data might help optimize memory caching in main processor 

15 100. All data which contains embedded addresses must be preceded by the appropriate G_SEGMENT command that 
loads the signal processor 400 segment table with the proper base address. 

Although signal processor 400 can use the segment addressing scheme shown in Figure 13C, this arrangement 
is not available to display processor 500 in this example. Hence : part of set up processing 648 is to translate any 
segment addresses that point to data structures required for rendering into physical addresses that can be used directly 

20 by display processor 500. 

DP Command Write : 

The primary output of signal processor 400 for graphics purposes is one or more commands to display processor 
2S 500 that arc outputted by Figure 11 , block 650. Although main processor 100 (or storage device 54) can directly supply 
display processor 500 commands, for 3D images the signal processor 400 generally needs to perform the transforma- 
tion processes described above to generate display processor commands representing transformed, projected lighted, 
clipped, culled primitives. 

The repertoire of display processor commands is set forth in Appendix A Signal processor 400 is responsible for 
30 formatting appropriately the display processor commands it generates, and for including the appropriate information 
and address information in the commands. In addition, signal processor 400 may generate and provide certain appro- 
priate mode and attribute commands the display processor may require to render a particular primitive specified by 
the signal processor using the appropriate parameters (although many of the mode and attribute commands for the 
display processor 500 are typically supplied directly by main processor 100 under control of game program 108). As 
35 mentioned above, main processor 100 can provide any display processor 500 directly, but in general, needs to rely on 
the signal processor to generate at least some display processor commands whenever 3D objects need to be trans- 
formed. 

Flow Control Command Processing : 

40 

Referring once again to Figure 11, if the display list command received by signal processor 400 is a flow control 
command, then signal processor 400 will respond to this command in an appropriate manner to navigate through or 
traverse the display list 110. The following example commands and formats provide flow control. 

45 



so 
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Example Flow Control Commands : 



GPL: 



5 


Command 


param 


(not used) 






scg 


address 



i 

10 



15 

m 



20 

This command points to another display list and is used to create display list hierarchies, nested display lists, 
indirect references, etc. The segment field identifies a memory segment. The address field is the offset from the base 
of that segment. Together, these form an address in main memory 300 pointing to the new display list. A length field 
25 (not shown) may describe the length of the new display list in bytes-although in this example it is preferred that all 
display lists are terminated by a G_JENDDL command. The parameter field holds flags which control the behavior of 
the transfer. If the flag G_DL_NOPUSH is set, the current display list is not pushed onto the stack before transferring 
control. This behaves more like a branch or go to, rather than a hierarchial display list (this may be useful to break up 
a larger display list into non-contiguous memory pieces, then just connect them with display list branches). 

30 



G ENDDL: 


command 





The end display list command terminates this branch of the display list hierarchy, causing a "pop" in the processing 
of the display list hierarchy. This command is most useful for constructing display list pieces of variable or unknown 
size, terminated with an end command instead of providing a display list length a priori. All display lists must terminate 
with Ihis command. 



40 


G NOOP: 




command 









This command does nothing It is generated internally under some circumstances. 

Figure 11, block 652 performs the function of maintaining a display list stack in main memory 300 and, pushing 
and nooping (traversing) this display list stack. Block 652 halts signal processor 400 when the signal processor en- 
counters an H open end" display list command. 

SIGNAL PROCESSOR MICROCODE AUDIO PROCESSING 

Signal processor 400 in this example performs digital audio processing in addition to the graphics processing 
discussed above Signal processor vector unit 420 is especially suited for performing "sum of products" calculations 
that are especially useful in certain types of digital signal processing for audio signals such as : for example, audio 
decompression, wavetable resampling, synthesis and filtering. Digital spatial and/or frequency filtering with a relatively 
large number of taps can be accommodated without loss of precision because of the 48-bit-wide accumulators con- 
tained with vector unit data paths 423. As one example of a particular optimum usage of vector unit 420 for audio 
processing, the eight separate register files 422 and associated data paths 423 of signal processor vector unit 420 can 
be used to simultaneously process eight different MIDI voices in parallel. The following are examples of additional 
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audio processing that can be efficiently performed using vector unit 420: 

• solving polynomial equations, 

• processing 8 audio voices or 8 time samples in parallel, 

5 • wavetable synthesis using cubic interpolation, wherein four of the vector unit data paths 423 are used to process 
one sample, and the other four vector unit data paths are used to process a second sample, 

• audio enveloping processing wherein the 8 vector unit data paths can each multiply a different audio sample by a 
different weighting factor, and 

• audio mixing processing wherein the 8 vector unit data paths can each multiply a different audio sample by a 
io corresponding mixer weighting factor. 

Because signal processor 400 can perform audio digital signal processing efficiently at high speed, it takes the 
signal processor only a small fraction of an audio playback real time interval to perform and complete the digital audio 
processing associated with that time interval. For example, signal processor 400 takes much less than 1/30th of a 

75 second to digitally process audio that coprocessor audio interlace 208 will playback in real time over a 1/30th of a 
second time interval. Because of this capability signal processor 400 in this example can be time-shared between 
graphics processing and digital audio processing. 

Generally, main processor 100 gives signal processor 400 a task list 250 at the beginning of a video frame that 
specifies the image and sound to be produced during the next succeeding video frame. Coprocessor 200 must be 

20 finished with both the audio and graphics processing for this next succeeding frame by the lime that next succeeding 
frame begins. Because video display and audio playback is a real time continuous process (i.e., a new video image 
must be provided each video frame time, and audio must be continuously provided), coprocessor 200 needs to finish 
all audio and video signal processing associated with each next succeeding video frame by the time that next frame 
begins. 

25 in this example, signal processor 400 is shared between graphics processing and digital audio signal processing 

Because of the high speed calculating capabilities of signal processor vector unit 420, signal processor 400 is able to 
complete processing of the audio to be played during the next succeeding video frame in much less than the current 
video frame time, and is also able to complete graphics processing for the image to be displayed during the next 
succeeding image in less than the current frame time. This allows task list 250 to specify both graphics display lists 

30 and audio play lists that all must be completed by signal processor 400/coprocessor 200 by the beginning of the next 
video frame time. However, in this example there is nothing to prevent main processor 100 from giving coprocessor 
200 a task list 250 that the coprocessor cannot complete before the next video frame begins. If the combined audio 
and graphics processing required by signal processor 400 is sufficiently intensive and time-consuming, the signal 
processor 400 can work on processing the task list for the entire current video frame time and still not be done by the 

35 beginning of the next video frame. It is up to video game program 108 to avoid overtaxing coprocessor 200, and to 
handle any overtaxing in an appropriate manner should it occur. A video game programmer can avoid overtaxing signal 
processor 400 by ensuring that all display lists 110 are organized efficiently, modeling the objects in 3-D in an efficient 
manner, and taking precautions to ensure that extensive time consuming processing (e.g., clipping) is avoided or 
minimized. Even with such precautions, however, it may take coprocessor 200 more than a single video frame time to 

40 complete especially complicated images. A video game programmer can handle this situation by slowing down the 
effective frame rate so that television 58 redisplays the same image stored in one part of frame buffer 118 for multiple 
video frames during which time coprocessor 200 can complete processing the next image. Because the user may 
perceive a variable frame rate as undesired delay, it is often best to slow down the overall effective frame rate to the 
rate required for coprocessor 200 to complete the most processing-intensive images-thus preventing more complex 

4s images from appearing more slowly than less complex images. 

With respect to audio processing, it is generally unacceptable to fail to provide audio for a given video frame time 
since the user will hearing a disturbing "click" in a stream of otherwise continuous audio. Such audio disruptions are 
easily heard and can be annoying Therefore, they should be avoided. One way to avoid an easily detectable audio 
disruption in a situation where signal processor 400 has failed to complete its assigned audio processing in time is for 

50 main processor 100 to command audio interface 208 to replay the last frame's worth of audio during the next succeeding 
frame. Acceptable audio can be produced in this way without the user noticing a disruption if done carefully. Other 
strategies include having signal processor 400 process multiple video frames worth of audio within a single video frame 
time-thereby providing an effective audio "frame" rate that is different (faster) than the effective video frame rate. By 
"effective frame rate" we mean the rato at which coprocessor 200 produces a frame's worth of information (in this 

55 example, the television actual video frame rate stays constant). 
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Example Audio Softwar Architecture 

Figure 14 shows an example of the overall software architecture provided by system 50 to synthesize and manip- 
ulate audio. This overall software architecture 700 includes four software objects, in this example a sequence player 
5 702, a sound player 704, a synthesis driver 706 and audio synthesis microcode 708 In this example, sequence player 
702, sound player 704, and synthesis driver 706 all execute on main processor 100 : and audio synthesis microcode 
708 runs on coprocessor signal processor 400. Thus, sequence player 702, sound player 704 and synthesis driver 
706 are each supplied as part of game program 108 of storage device 54, and audio synthesis microcode 708 is 
supplied as part of SP microcode 156. 
to Sequence player 702, sound player 704 and synthesis driver 706 may differ depending on the particular video 

game being played. In general, sequence player 702 is responsible for the playback of Type 0 MIDI music sequence 
files. It handles sequence, instrument bank and synthesizer resource allocation, sequence interpretation, and MIDI 
message scheduling. Sound player 704 is responsible for the playback of all ADPCM compressed audio samples. It 
is useful for sound effects and other streamed audio. Synthesis driver 706 is responsible for creating audio play lists 
15 no which are packaged into tasks by main processor 100 under software control and passed to coprocessor 200 in 
the form of task lists 250. In this example, synthesis driver 706 allows sound player 704 or other '•clients" to assign 
wave tables to synthesizer voices, and to control playback parameters. As discussed above, the audio synthesis mi- 
crocode 708 processes tasks passed to it and synthesizes L/R stereo 1 6-bit samples, which signal processor 400 
deposits into audio buffers 114 Within main memory 300 for playback via audio interface 208, audio DAC140 and 
20 amplifier/mixer 142, 

In this example, synthesis driver 706 passes audio tasks to signal processor 400 in the form of audio "frames." A 
"frame - is a number of audio sampies-usually something close to the number of samples required to fill a complete 
video frame time at the regular video frame rate (for example, 30 or 60 Hz). Although television set 58 receives and 
processes audio signals in a continuous stream unconstrained by any video frame rate parameter (e.g., the television 
25 can generate audio during horizontal and vertical video blanking and retrace), system 50 in this example organizes 
audio processing in terms of video frame rate because signal processor 400-which is shared between audio and graph- 
ics processing-must operate in accordance with the video frame rate because the graphics related tasks it performs 
are tied to the video frame rate. 

30 Example Play List Processing 

Figure 15 shows an example of a simple signal processor play list process. The Figure 15 process is specified by 
a play list 110 generated by main processor 100 under control of video game program 108, and specified as part of a 
task list 250. Thus, the Figure 1 5 SP play list process is an example of an output of synthesis driver 706 that is provided 

35 to signal processor 400 in the form of an audio play list 110. 

Because of the limited size of instructbn memory 402 in this example, audio synthesis microcode 708 is generally 
not continuously resident within signal processor 400. Instead, the initialization microcode main processor 100 arranges 
to be loaded into instruction memory 402 (see Figure 9 ; block 604), ensures that the appropriate audio microcode 
routine is loaded into the instruction memory for audio processing (also ensures that the appropriate graphics microcode 

40 routine is loaded into the instruction memory for graphics processing). The steps shown in Figure 15 assume that the 
audio synthesis microcode 708 is resident within the signal processor instruction memory 402, and that the signal 
processor 400 is reading an audio play list 110 -specifying the steps shown. 

Generally, the first task of an audio play list 110 is to set up buffers within signal processor data memory 408 
required to perform the audio processing task (Figure 15, block 710). Generally, this buffer set up process involves 

45 allocating areas within data memory 404 to be used as one or more audio input buffers, and allocating an audio output 
buffer within the data memory. Generally, main processor 100 also commands signal processor 400 to use its DMA 
facility 454 to retrieve audio input data 112b Irom main memory into the allocated input buffer(s) for processing. Main 
processor 1 00 may next set certain attributes (e.g. , volume ranges and change rates) to be used for the audio process- 
ing (Figure 15, block 712). Main processor 100 then specifies the types of signal processing to be performed by signal 

so processor 400 along with appropriate parameters (Figure 15, block 714). In this example, main processor 100 can 
specify decompression, resampling, envelope/pan, mixing, and other processing (e.g., reverb) to be performed indi- 
vidually or in combination. The audio play list 110 typically will terminate with a command to save the contents of the 
output audio buffer stored in signal processor data memory 404 into main memory 300 (block 716). 

ss Example Audio Synthesis Microcode 

Figure 16 shows the overall tasks performed by audio synthesis microcode 708 in this example Signal processor 
400 under microcode control retrieves the next play list command from the current audio play list 110, and determines 
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what kind of command it is (Figure 16, block 718). In this example, the audio command within an audio play list 110 
may fall into the following general types- 

• buffer command 

s ♦ flow control command 

• attribute command 

• decompress command 

• resample command 

• envelope/pan command 
10 • mix command 

• special signal processing/effects command. 



Buffer Command Processing : 

'5 Buffer commands manage audio buffers within signal processor data memory 404, and permit audio data to be 

transferred between the data memory and main memory 300. The following are examples of buffer command formats 
and associated functions: 



Example Buffer Commands : 

20 



A_SETBUFF: 




command 




dmemin 






dmomout 


count 





Th is command sets the internal signal processor data memory 404 buffer pointers and count value used by the process- 
ing commands. This command is typically issued before any processing command, dmemin points to an input buffer, 
dmemout to an output buffer and count defines the number of 16 bit samples to process. 

30 



35 





command 








seg 




address 





40 



This command loads a signal processor data memory 404 buffer from the main memory 300 address given by the 
seg+address fields. The SP data memory buffer location and the number of 16 bit samples to load are defined by 
issuing an A_SETBUFF command prior to the AJ.OADBUFF command. 



45 



A CLEARBUFF: 




command 




dmemin 










count 





so 



This command clears an area of size count 16 bit samples starting at the signal processor 400 data memory address 
given by dmem. 



A SAVEBU1 


EE; 




command 








seg 




address 
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This command saves a buffer of 16 bit samples in the signal processor data memory 404 to the main memory 300 
address given by the seg+address field. The input SP data memory buffer and number of samples are defined by 
issuing a A_SETBUFF command. 



5 



A SRKMEN 


I: 




command 








seg 




address 





See graphics G-SEGMENT command. This command is used to map indirect "segment" addresses into main memory 
300 physical addresses. 

is Referring again to Figure 16, signal processor audio synthesis microcode 708 performs the specified buffer com- 

mand by establishing, managing, writing data into, or reading data from the associated data memory buffer 409 (Figure 
16, block 720). Typically, signal processor 400 may use its DMA facility 454 to transfer data between main memory 
300 and signal processor memory 404 in order to retrieve audio input data for processing or save audio data into main 
memory for playback by audio interface 208. 

20 

Flow Control Command Processing : 

If the next play list command is a flow control command, signal processor 400 responds to the command by tra- 
versing the current audio play list in the manner specified by the command. Nesting of audio play lists 11 0 is preferably 
2$ permitted, and signal processor 400 may maintain an audio play list stack in main memory 300 (just as it may do for 
graphics display lists). 

Attribute Command Processing : 

30 it the next audio play list command is an attribute command, signal processor 400 processes the command by 

establishing appropriate mode and/or attribute conditions to be used for subsequent audio processing (Figure 16, block 
724). In this example, audio synthesis microcode 708 supports the following example attribute command format and 
associated function: 

35 Example Attribute Commands : 



40 



A SETVOL: 




command 




volume 






volume target 


volume rate 





This command is used to set the volume parameters for subsequent processing commands. Currently this should be 
issued prior to A.ENVELOPE, A_PAN and A_RESAMPLE. 

Decompress Command Processing 

If the next audio play list command retrieved by signal processor 400 is a decompression command, the signal 
processor performs a decompression operation to decompress a compressed audio binary stream stored in an input 
buffer within data memory 404 to produce 16-bit audio samples which it stores in a defined audio output buffer within 
its data memory (Figure 16, block 726). In this example, audio synthesis microcode 708 supports the following audio 
decompression command format and associated function: 
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Example D compres ion Command : 
A APPCM : 

5 





command j flags 


gain 






seg 




address 





This command decompresses a binary stream in signal processor data memory 404 to produce 16 bit samples. The 
addresses in the data memory 404 for the input and output buffers and the number of samples to process are defined 
by issuing a A_SETBUFF command prior to the A_ADPCM command. The seg+address field points to a main memory 
is 300 location which is used to save and restore state. The gain parameter is used to scale the output and is represented 
as S.15. 

The flags define the behavior of the command. Currently defined flags are: 

A J NIT, The seg+address field is used to restore state at the beginning of the command. If not set the pointer to 
20 state is ignored upon initiation, however, stale is saved to this address at the end ol processing. 

A_MIX, The results are mixed into the output buffer. If not set results are put into the output buffer. 

Resample Command Processing : 

25 

If the next audio play list command signal processor 400 reads is a resample command, then the signal processor 
provides pitch shifting/resampling as well as integral envelope modulation based on the parameters specified, in the 
command (Figure 16, block 728). The following is an example of a resample command and associated function sup- 
ported by audio synthesis microcode 708. 

30 

Example Resample Command : 





command 


flags 


pitch 






scg 




address 





40 This command provides pitch shifting/resampling as well as integral envelope modulation. The signal processor data 
memory 404 input and outpul buffers and the number of samples are defined by issuing an A_SEETBUFF command, 
and the volume envelope parameters are defined by issuing an A_SETVOL command. Resampling factor is defined 
by pitch. 

The flags define the behavior of the command. Currently defined flags are: 

45 

A J NIT, The seg+address field field is used to restore state at the beginning of the command. If not set the pointer 
to state is ignored upon initiation, however, state is saved to this address at the end of processing. 

A_MIX, The results are mixed into the output buffer. If not set results are put into the output buffer. 

so 

Envelope/Pan Command Processing : 

If the next audio play list command signal processor 400 reads is an envolope/pan command, the signal processor 
performs that command by modulating one or two audio signal streams using a linear envelope (Figure 1 6, block 730). 
55 An envelope command multiplies an audio input sample stream by a linear function and is thus able to ramp the volume 
of the audio up or down. A "pan" command generally applies inverse linear functions to audio in left and right stereo 
channel-accomplishing the effect of moving the perceived source of a sound or voice in space (i.e., from left to right 
or from right to left) The following examples of envelope/pan command formats and associated functions are supported 
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by audio synthesis microcode 708 in this example of system 50. 
Exampl Envelope/Pan Commands : 







command 


flags 






10 




seg 




address 





This command modulates a sample stream using a linear envelope. The parameters for the volume envelope are 
defined by issuing A_SETVOL and the signal processor data memory 404 buffer locations and number of samples to 
^5 process are defined by issuing an A_SETBUFF prior to issuing the A_ENVELOPE command. 
The flags define the behavior of the command. Currently defined flags are: 

AJNIT, The seg+address field field is used to restore state at the beginning of the command. If not set the pointer 
to state is ignored upon initiation, however, state is saved to this address at the end of processing. 



20 



A_MIX, The results are mixed into the output buffer. If not set results are put into the output buffer. 



2S 





command 


flags 


dmemout2 






scg 




address 





30 This command provides 1 input, 2 output panning Input, first output and number of samples are defined by issuing an 
A_SETBUFF command and the panning parameters are defined by issuing an A_SETVOL command. The second 
output is defined by dmemout2. 

The flags defined the behavior of the command. Currently defined flags are: 

35 AJNIT, The seg + address field field is used to restore state at the beginning of the command. If not set the pointer 
of state is ignored upon initiation, however, state is saved to this address at the end of processing. 

A_MIX, The results are mixed into the output buffer. If not set results are put into the output buffer. 



40 Mixing Command Processing : 



If the next audio play list command is a mixing command, signal processor 400 performs a mixing function to mix 
two audio input sample streams into the output audio buffer (Figure 16, block 732). The following example mixing 
command format and associated function is supported by signal processor 400 and audio synthesis microcode 708 in 
45 this example. 

Example Mixer Command : 



so 



A MIXER: 




command 




gain 










dmemoutf 





This command provides a double precision mixing function. The single precision input is added to the double precision 
output after multiplication by gain, dmemoutf points to a signal processor data memory 404 area which stores the 
fractional part of the mixed stream. The input buffer, number of samples and integer part of the mixed output are defined 
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by issuing an A_SETBUFF prior to the A_MIX. 
Special Audio Effects Proc ssing : 

5 If the next audio play list command is a special signal processing/effects command, signal processor 400 executes 

the command by providing the specified special effect or signal processing (Figure 1 6, block 734). An example special 
signal processing/effect is the addition of reverberation to create presence. This special effect simulates sound reflec- 
tion in caves, concert halls, etc., and can also be used for various other special effects. Signal processor 400 and audio 
synthesis microcode 709 supports the following example reverberation special effects command format and associated 

io function: 

Example Effects Command : 
A REVERB : 

75 





command 


flags 








scg 




address 





This command applies the reverb special effect to a sample stream. Signal processor data memory 404 input, output 
and number of samples are defined by issuing an A_SETBUFF command. 
2$ The flags define the behavior of the command. Currently defined flags are: 

AJNIT, The seg+actdress field field is used to restore state at the beginning of the command. If not set the pointer 
to state is ignored upon initiation, however, state is saved to this address at the end of processing. 

30 A_MIX, The results are mixed into the output buffer If not set results are put into the output buffer. 

Audio Processing Structure : 

To accomplish each of audio processing functions 728, 730, 732, 734 in this example, audio synthesis microcode 

35 706 uses a general purpose effects implementation that manipulates data in a single delay line. Figure 17 shows an 
example general purpose audio processing implementation 740. In this example, the audio input samples can be 
conceived of as being applied to the input of contiguous single delay line 742. The output tap of the delay line is applied 
through a gain 744 to the audio output buffer within signal processor data memory 404. Samples from another tap on 
delay line 742 are passed through a summer 746 and returned to the delay line directly (over path 748) and also through 

40 a coefficient block 750, another summer 752 and a low pass filter 754. A further tap 756 from delay line 742 is connected 
to the other input of summer 752 and also to the other input of summer 746 (this time through a further coefficient block 
758). This generalized implementation 740 allows a particular effect to be constructed by attaching an arbitrary number 
of effect primitives to single delay line 742. The parameters for each primitive in the effect are passed through via the 
commands discussed above. Each primitive consists of an all-pass with a variable length tap followed by a DC normalize 

45 (unity gain at DC) single poll low-pass filter 754 followed by an output gain 744 specifying how much of this primitive's 
output is to be contributed to the final effect output. The value of each of the parameters for a primitive specifies the 
function of that primitive as a whole within the effect. Note that in Figure 17, the feedback coefficient 758 can be used 
to construct an "all-pass inside a comb" reverb (in response to the a_reverb command discussed above). 

The general nature of implementation 740 does not mean that all functions are implemented. Only those functions 

so which are driven by legitimate parameters actually generate audio command operations by signal processor 400. This 
gives video game programmers a great degree of flexibility in defining an effect that is appropriate in terms of both 
sonic quality and efficiency. 

COPROCESSOR DISPLAY PROCESSOR 500 

55 

Display processor 500 in this example rasterizes triangles and rectangles and produces high quality pixels that 
are textured, anti-aliased and z-buffered Figure 18 shows the overall processes performed by display processor 500 
Display processor 500 receives graphics display commands that, for example, specify the vertices, color, texture, 
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surface norma/ and other characteristics of graphics primitives to be rendered. In this example, display processor 500 
can render lines, triangles, and rectangles Typically, display processor 500 will receive the specifications for the prim- 
itives it is to render from signal processor 400, although it is also possible for main processor 100 to specify these 
commands directly to the display processor. 

The first operation display processor 500 performs on an incoming primitive is to rasterize the primitive, i.e., to 
generate pixels that cover the interior of the primitive (Figure 18, block 550). Rasterize block 550 generates various 
attributes (e.g., screen location, depth, RGBA color information, texture coordinates and other parameters, and a cov- 
erage value) for each pixel within the primitive. Rasterize block 550 outputs the texture coordinates and parameters 
to a texture block 552. Texture block 552 accesses texture information stored within texture memory 502, and applies 
("maps") a texel (texture element) of a specified texture within the texture memory onto each pixel outputted by ras- 
terized block 550 A color convert block 554 and a chroma keying block 556 further process the pixel value to provide 
a texture color to a color combined block 558. 

Meanwhile, rasterize block 550 provides a primitive color (e.g., as a result of shading) for the same pixel to color 
combine block 558. Color combined block 558 combines these two colors to result in a single pixel color. This single 
pixel color output may have fog applied to it by block 560 (e.g., to create the effect of a smoke filled room, or the less 
extreme, natural effect of reducing color brilliance as an object moves further away from the viewer). The resulting 
pixel color value is then blended by a block 562 with a pixel value framebuff er 1 1 8 stores for the same screen coordinate 
location. An additional anti-alias/z-buffer operation 564 performs hidden surface removal (i.e. , so closer opaque objects 
obscure objects further away), anti-aliasing (to remove jaggedness of primitive edges being approximated by a series 
of pixels), and cause the new pixel value to be written back into framebuff er 118. 

The operations shown in Figure 18 are performed for each pixel within each primitive to be rendered. Many prim- 
itives may define a single complex scene, and each primitive may contain hundreds or thousands of pixels. Thus, 
display processor 500 must process millions of pixels for each image to be displayed on color television set 58. 

Typically, framebuff er 118 is "double buffered" -- meaning that it is sized to contain two complete television screen 
images. Display processor 500 fills one screen worth of framebuffer information while video interface 210 reads from 
the other half of the framebuffer 118. At the end of the video frame, the video interface 210 and display processor 500 
trade places, with the video interface reading from the new image representation just completed by display processor 
500 and the display processor rewriting the other half of the framebuffer. This double buffering does not give display 
processor 500 any more time to complete an image; it must still finish the image in nominally one video frame time (i. 
e., during the video frame time just prior to the frame time during which the new image is to be displayed) 

Pioelintn q : 

Because high speed operation is very important in rendering pixels, display processor 500 has been designed to 
operate as a "pipeline.'' Referring again to Figure 18 "pipelining" means that the various steps shown in Figure 18 can 
be performed in parallel for different pixels. For example, rasterize block 550 can provide a first pixel value to texture 
block 552, and tnen begin working on a next pixel value while the texture bbek is still working on the first pixel value. 
Similarly, rasterize block 550 may be many pixels ahead of the pixel that blend block 562 is working on. 

In this example, display processor 500 has two different pipeline modes: one-cycle mode, and two-cycle mode. In 
one-cycle mode, one pixel is processed for each cycle time period of display processor 500. A one-cycle mode operation 
is shown in Figure 1 9A. Note that the operations shown in Figure 1 9A are themselves pipelined (i.e., the blend operation 
562 operates on a different pixel than the rasterize operation 550 is currently rasterizing), but the overall operation 
sequence processes one pixel per cycle. 

Figure 19B shows the two-cycle pipeline mode operation of display processor 500 in this example. In the Figure 
1 9B example, some of the operations shown in Figure 18 are performed twice for each pixel. For example, the texture 
and color convert/filtering operations 552, 554 shown in Figure 18 are repeated for each pixel; the color combine 
operation 558 is performed twice (once lor the texture color output of one texture operation, and once for the texture 
color output ol the other texture operation). Similarly, blend operation 562 shown in Figure 18 is performed twice for 
each pixeJ. 

Even though these various operations are performed twice, display processor 500 in this example does not contain 
duplicate hardware to perform the duplicated operations concurrently (duplicating such hardware would have increased 
cost and complexity). Therefore, in this example, signal processor 500 duplicates an operation on a pixel by processing 
it with a particular circuit (e g , a texture unit, a color combiner or a blender), and then using the same circuit again to 
perform the same type of operation again for the same pixel. This repetition slows down the pipeline by a factor of two 
(each pixef must "remain" at each stop in the pipeline for two cycles instead of one), but allows more complicated 
processing. For example, because the two-cycle-per-pixel mode can map two textures onto the same pixel, it is possible 
to do "trilinear" ("mipmapping") texture mapping In addition, since in this example, display processor 500 uses the 
same blender hardware to perform both the fog operation 560 and the blend operation 562 (but cannot both blend and 
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fog simultaneously) , it is generally necessary to operate in the two-cycle*per-pixel mode to provide useful fog effects. 

The following tables summarize the operations performed by the various blocks shown in Figures 1 9A and 1 9B 
during the one-cycle and two-cycle modes: 



Display Processor Pipeline Block Functionality in One-Cycle Mode 


Block 


Functionality 


Rasterize 550 


Generates pixel and its attribute covered by the interior of the primitive 


Texture 552 


Generates 4 tcxels nearest to this pixel in a texture map. 


Filter Texture 554 


Bilinear filters 4 texels into 1 texel, OR performs step 1 of YUV-to-RGB conversion. 


Combine 558 


Combines various colors into a single color OR performs step2 of YUV-to-RGB conversion. 


Blend 562 


Blends the pixel with framebuffer memory pixel, OR fogs the pixel for writing to framebuffer. 


Framebuffer 563 


Fetches and writes pixels (color and z) from and to the framebuffer memory. 



Display Processor Pipeline Block Functionality in Two-Cycle Mode 


Block 


Functionality 


Rasterize 550 


Generates pixel and its attribute covered by the interior of the primitive. 


Texture 552a 


Generates 4 texels nearest to this pixel in texture map. This can be level X of a mipmap. 


Texture 552b 


Generates 4 texels nearest to this pixel in a texture map. This can be level X+1 of a mipmap. 


Filter Texture 554a 


Bilinear; filters 4 texels into 1 texel. 


Filter Texture 554b 


Bifinear, filters 4 texels into 1 texel. 


Combine 558a 


Combines various colors into a single color, OR linear interpolates the 2 bilinear filtered texels 
from 2 adjacent levels of a rrripmap, OR performs step 2 of YUV-to-RGB conversion. 


Combine 558b 


Combines various colors into a single color, OR chroma keying. 


Blend 562a 


Combines fog color with resultant CC1 color. 


Blend 562b 


Blends the pipeline pixels with framebuffer memory pixels 


Framebuffer 563a 


Read/modify/write color memory; and 


Framebuffer 563b 


Read/modify/write Z memory. 



Fill and Copy Operations : 

40 

Display processor 500 also has a 'fill" mode and a "copy" mode, each of which process four pixels per cycle. The 
fill mode is used 1o fill an area of framebuffer 118 with identical pixel values (e.g., for high performance clearing of the 
framebuffer or an area of it). The copy mode is used for high-performance image-to-image copying (e.g., from display 
processor texture memory 502 into a specified area of framebuffer 118). The copy mode provides a bit "blit* operation 
45 in addition to providing high performance copying in the other direction (i.e., from the framebuffer into the texture 
memory). 

The pipeline operations shown in Figures 1 9A and 1 9B are largely unused during the fill and copy modes, because 
in this example, the operations cannot keep up with the pixel fill or copy rate. However, in this example, an "alpha 
compare" operation (part of blend operation 562) is active in the copy mode to allow display processor 500 to "blif an 
so image into framebuffer 118 and conditionally remove image pixels with the word alpha=0 (e.g., transparent pixels). 

The display processor's mode of operation is selected by sending the display processor 500 a "set other mode" 
command specifying a "cycle type" parameter. See Appendix A. In the one-cycle-per-pixel or two-cycle-per-pixel pipe- 
lin modes, additional display processor 500 commands are available to insure that pipeline synchronization is main- 
tained (e.g., so that the pipeline is emptied of one primitive before the parameters of another primitive take effect). See 
ss 'Sync Pipe* command set forth in Appendix A. 
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EXAMPLE DISPLAY PROCESSOR 500 ARCHITECTURE 

Figure 20 shows an example architecture of display processor 500. In this example, display processor 500 includes 
a command unit 514 with associated RAM 516 and DMA controller 518; an "edge walkerVrastedzor 504; a RGBAZ 
5 pixel stepper 520; a color combiner/level interpreter 508, a blender/fogger 510, a ditherer 522, a coverage evaluator 
524, a depth (z) comparator 526, a memory interface 51 2 and a texture unit 506. In this case, texture unit 506 includes, 
in addition to texture memory 502, texture steppers 528, a texture coordinate unit 530 and a texture filter unit 532. 

Command unit 514 and DMA controller 518 connect to coprocessor main internal bus 214, and also connect to 
the signal processor 400 via a private "x* bus 218. Memory interface 512 is a special memory interface for use by 
10 display processor 500 primarily to access to the color framebutfer 1 18a and the z buffer 118b stored within main memory 
300 (thus, display processor 500 has access to main memory 300 via memory interface 512 and also via coprocessor 
internal bus 214). 

DMA Controller : 

75 

DMA controller 518 receives DMA commands from signal processor 400 or main processor 100 over bus 214. 
DMA controller 518 has a number of read/write registers shown in Figures 21 A-21C that allow signal processor 400 
and/or main processor 100 to specify a start and end address in SP data memory 404 or main memory 300 from which 
to read a string of graphics display commands (Figure 21 A shows a start address register 51 8A, and Figure 21 B shows 

20 an end address register 51 8B). DMA controller 518 reads data over main coprocessor bus 214 if registers 518a, 518b 
specify a main memory 300 address, and it reads data from the signal processor's data memory 404 over private "x 
bus" 214 if the registers 518a, 518b specify a data memory 404 address. DMA controller 518 also includes a further 
register (register 51 8C shown in Figure 21 C) that contains the current address DMA controller 518 is reading from. In 
this example, DMA controller 518 is uni-directional -- that is, it can only write from bus 21 4 into RAM 516. Thus, DMA 

2$ controller 518 is used in this example for reading from signal processor 400 or main memory 300. In this example, 
display processor 500 obtains data for its texture memory 502 by passing texture load commands to command unit 
514 and using memory interface 512 to perform those commands. 

Command Unit : 

30 

Command unit 514 retains much of the current state information pertaining to display processor 500 (e.g., mode 
and other selections specified by "set commands"), and outputs attributes and command control signals to specify and 
determine the operation of the rest of display processor 500. Command unit 514 includes some additional registers 
that may be accessed by main processor 100 (or signal processor 400) via coprocessor bus 214. These additional 

35 registers, which are mapped into the address space -of main processor 100 : permit the main processor to control and 
monitor display processor 500. 

For example, command unit 51 4 includes a status/command register 534 shown in Figure 21 D that acts as a status 
register when read by main processor 1 00 and acts as a command register when the main processor writes to it. When 
reading this register 534, main processor 100 can determine whether display processor 500 is occupied performing a 

40 DMA operation reading from signal processor data memory 404 (field 536(1); whether the display processor is stalled 
waiting for access to main memory 300 (field 536(2); whether the display processor pipeline is being flushed (field 536 
(3); whether the display processor graphics clock is started (field 536(4); whether texture memory 502 is busy (field 
536(5); whether the display processor pipeline is busy (field 536(6); whether command unit 514 is busy (field 536(7); 
whether the command buffer RAM 51 6 is ready to accept new inputs (field 536(8); whether DMA controller 51 8 is busy 

45 (field 536(9); and whether the stan and end addresses and registers 518a and 518b respectively valid (fields 536 (10), 
536(11), When writing to this same register 534, main processor 100 (or signal processor 400) can clear an X-bus 
DMA operation from the signal processor 400 (field 538 (1); begin an X-bus DMA operation from signal processor data 
memory 404 (field 538(2); start or stop the display process (fields 538 (3), 538(4); start or stop a pipeline flushing 
operation (fields 538 (5), 538(6); clear a texture memory address counter 540 shown in Figure 21 H (field 538 (7), clear 

so a pipeline busy counter 542 shown in Figure 21 F (field 538(8); clear a command counter 544 used to index command 
buffer RAM 516 (field 538(9) (the counter 544 is shown in Figure 21 G); and clear a clock counter 546 (see Figure 21 E) 
used to count clock cycles (field 538 (10). 

As mentioned above, the clock count, buffer count, pipeline count and texture memory count can all be read directly 
from registers 540-546 (see Figures 21 E-21 H). In addition, main processor 100 or signal processor 400 can read and 

ss control the BIST operation pertaining to texture memory 502 (see BIST status/controlregister548shown in Figure 21 1), 
and can also enable in control testing of memory interface 512 by manipulating mem span test registers 549(a), 549 
(b) and 549(c) shown in Figure 21J 

Referring back to Figure 20, once one or more commands have been loaded into command unit buffer ram 518 
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and display processor 500 has been started, command unit 514 begins reading and processing each command se- 
quentially, The repertoire of commands display processor 500 understands are set forth in Appendix A. Hardware (e 
g., logic, gate arrays and the like) within display processor 500 directly interpret the graphics display commands within 
RAM 516. In this example, display processor 500 has no ability to branch or jump in traversing this list of commands. 

5 Rather, display processor 500 in this example is a sequential state machine that accepts each new command as an 
input in strict sequence and alters its states and outputs in response to the command. 

Display processor 500 halts if its command buffer RAM 516 is empty (i.e., it has processed all of the commands 
in the buffer, which buffer acts as a FIFO). Main processor 100 or signal processor 400 can determine if display proc- 
essor 500 has halted by reading display processor status register 534 and may, if desired, pass the display processor 

10 a command that stalls the display processor temporarily (see "Sync Full" command in Appendix A). 

Edgewalker and Steppers : 

Edgewalker 504 shown in Figure 20 performs the rasterize process 550 shown in Figure 18. In this example, 
15 edgewalker 504 receives the edge coefficients, shade coefficients, texture coefficients and z buffer coefficients specified 
in a "triangle command" (see Appendix A specifying a particular primitive open line, triangle or rectangle), and outputs 
"span" values from which the following attributes for each pixel enclosed within the primitive can be derived: 

• screen x, y location 

20 m i depth for z buffer purposes 

• RGBA color information 

• s/w, t, w, 1/w texture coordinates, level-of-detail for texture index, perspective correction, and mipmapping (these 
are commonly referred to s, t, w, 1) 

coverage value (pixels on the edge of a primitive have partial coverage values, whereas pixels within the interior 
25 of a primitive are full). 

Edgewalker 504 sends the parameters for a line of pixels across the primitive (a "span") to the pipeline hardware 
downstream for other computations. In particular, texture steppers 528 and RGBAZ steppers 520 receive the "span* 
information specified by edgewalker 504, and step sequentially along each pixel in the horizontal line (in the view plane 
30 coordinate system) of the "span" to derive the individual texture coordinates and RGBAZ values for each individual 
pixel in the span. 

The RGBAZ stepper 520 may also perform a •scissoring" operation on triangle primitives (this does not work for 
rectangles in this example) to efficiently eliminate portions of triangle primitives extending outside of a view plane 
scissoring rectangle Scissoring is commonly used to eliminate running performance-intensive clipping operations on 
35 signal processor 400. Scissoring is similar in concept to clipping, but whereas clipping is performed in the 3-D coordinate 
system, scissoring is performed in the 2-D coordinate system of the viewing plane. Scissoring by steppers 520, 528 
is invoked by sending display processor 500 a "set scissor" command (see Appendix A). 

As mentioned above, steppers 520 produces color and alpha information for each pixel within the 'span" defined 
by edgewalker 504. Similarly, texture steppers 528 produces texture coordinate values (s, t, w) for each pixel within 
40 the span. Steppers 520. 528 operate in a synchronized fashion so that texture unit 506 outputs a mapped texture value 
for a pixel to color combiner 58 at the same time that the RGBAZ steppers 520 output a color value for the same pixel 
based on primitive color, shading, lighting, etc. 

Texture Unit : 

45 

Texture unit 506 in this example takes ihe texture coordinates s, t, w and level-of-detail values for a pixel (as 
mentioned above, texture steppers 528 derive these values for each individual pixel based upon "span* information 
provided by edgewalker 504), and fetches appropriate texture information from onboard texture memory 502 tor map- 
ping onto the pixel. In this example, the four nearest texels to the screen pixel are fetched from texture memory 502, 

50 and these four texel values are used for mapping purposes. Video game program 108 can manipulate texture states 
such as texture image types and formats, how and where to load texture images, and texture sampling attributes. 

Texture coordinate unit 530 computes appropriate texture coordinates for mapping texture stored within texture 
memory 502 onto the primitive being rendered. Since the 2-dimensional textures stored in texture memory 502 are 
square or rectangular images that must be mapped onto triangles of various sizes, the texture coordinate in 530 must 

ss select appropriate lexels within the texture to map onto pixels in the primitive to avoid distorting the texture. See OpenGL 
Programming Guide at 278. 

Texture coordinate unit 530 computes a mapping between the inputted pixel texture coordinates and four texels 
within the appropriate texture stored in texture memory 502. Texture coordinate unit 530 then addresses the texture 
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memory 502 appropriately to retrieve these four texels. The tour texei values are passed to the texture filter unit 532. 
Texture filter 532 takes the four texels retrieved from texture memory 502 and produces a simple bilinear-filtered texel 
Texture filter 532 in this example can perform three, types of filter operations: point sampling, box filtering, and bilinear 
interpolation- Point sampling selects the nearest texel to the screen pixel. In the special case where the screen pixel 
is always the center of four texels, the box filter can be used. In the case of the typical 3-D, arbitrarily rotated polygon, 
bilinear tittering is generally the best choice available For hardware cost reduction, display processor texture filter unit 
532 does not implement a true bilinear filter. Instead, it linearly interpolates the three nearest texels to produce the 
result pixels. This has a natural triangulation bias which is not noticeable in normal texture images but may be noticed 
in regular pattern images. This artifact can be eliminated by prefiltering the texture image with a wider filter. The type 
of filtering performed by texture filter unit 532 is set using parameters in the "set modes" display command (see Appendix 
A). 

TEXTURE MEMORY 502 

Display processor 500 treats texture memory 502 as a general-purpose texture memory. In this example, texture 
memory 502 is divided into four simultaneously accessible banks, giving output of four texels per clock cycle. Video 
game program 58 can load varying-sized textures with different formats anywhere in the texture memory 502. Texture 
coordinate unit 530 maintains eight texture tile descriptors that describe the location of texture images within texture 
memory 502, the format of each texture, and its sampling parameters This allows display processor 500 to access as 
many as eight different texture tiles at a time (more than eight texture tiles can be loaded into the texture memory, but 
only eight tiles are accessible at any time). 

Figure 22 shows an example of the texture tile descriptors and their relationship to texture tiles stored in texture 
memory 502. (n this particular example shown in Figure 22, eight different texture tiles 802 are stored within texture 
memory 502. Each texture tile 802 has an associated texture tile descriptor block 804 (as discussed above, display 
processor 500 maintains up to eight descriptors 804 corresponding to eight texture tiles stored within texture momory 
502). The texture descriptors contain information specified by a u set tile" command (see appendix A). For example, 
these texture tile descriptors specify the image data format (RGB A, YUV, color index mode, etc.), the size of each 
pixeVtexel color element (four, eight, sixteen, thirty -two bits), the size of the tile line in 64-bit words, the starting address 
of the tile in texture memory 502, a palette number for 4-bit color indexed texels, clamp and mirror enables for each 
of the S and T directions, masks for wrapping/mirroring in each of S and T directions, level of detail shifts for each of 
S and T addresses. These descriptors 804 are used by texture coordinate unit 530 to calculate addresses of texels 
within the texture memory 502. 

Texture Coordinate Unit : 

Figure 23 shows a more detailed example of the processing performed by texture coordinate unit 530. Figure 23 
shows the various tile descriptors B04 being applied as inputs to texture coordinate unit 530. Figure 23 also shows 
that texture coordinate unit 530 receives the primitive tile/levei/texture coordinates for the current pixel from texture 
steppers 528. Texture coordinate unit 530 additionally receives mode control signals from command unit 514 based, 
for example, on the "set other mode* and "set texture image* commands (see Appendix A). Based on all of this input 
information, texture coordinate unit 530 calculates which tile descriptor 804 to use for this primitive, and converts the 
inputted texture image coordinates to tile-relative coordinates which the texture coordinate unit wraps, mirrors and/or 
damps as specified by the tile descrtptor BOA. Texture coordinate unit 530 then generates an offset into texture memory 
502 based on these tile coordinates. The texture coordinate unit 530 in this example can address 2x2 regions of 
texels in one or two cycle mode, or 4 x 1 regions in copy mode. Texture coordinate unit 530 also generates S/T/L 
fraction values that are used to bi-linearly or tri-linearly interpolate the texels. 

Figure 24 is a detailed diagram of texture coordinate unit 530 and texture memory unit 502. As shown in Figure 
24, the incoming s, t, w texture coordinates are inputted into a perspective correction block 566 which provides a 
perspective correction based on w when perspective correction is enabled. The perspective-corrected s, t values are 
then provided to a level-of-detail or precision shift block 568 which shifts the texture coordinates after perspective 
divide (e.g., for Ml P mapping and possibly for precision reasons). A block 570 then converts the shifted texture coor- 
dinates to tile coordinates, providing fractional values to the texture filter unit 532. These tile coordinate values are 
then clamped, wrapped and/or mirrored by block 572 based on the current texture mode parameters of display proc- 
essor 500. Meanwhile, the perspective-corrected texture coordinates provided by perspective correction block 566 are 
also provided to a level of detail block 574 which, when level of detail calculations are enabled, calculates a tile descriptor 
index into a tile descriptor memory 576 and also calculates a level of detail fractional value for interpolation by the color 
combiner 508 The tile descriptors 804 are stored in tile descriptor memory 576, and are retrieved and outputted to a 
memory conversion block 578 which conversion block also receives the adjusted texture coordinate values of block 
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572. Address conversion block 578 converts the adjusted texture coordinate values into texture memory unit addresses 
based on current tile size, format and other parameters as specified by the tile descriptor 804. Address conversion 
block 578 outputs the texel address to texture memory unit 502 The texture memory unit 502 also receives additional 
parameters which are used, for example, if the texture is color indexed. Texture memory unit 502 outputs four texel 
5 values to texture filter unit 532 for filtering as discussed above. 

Texture Memory Loading : 

Texture memory unit 502 includes a four kilobyte random access memory onboard coprocessor 200. Because 

to texturing requires a large amount of random accesses with consistent access time, it is impractical to texture directly 
from main memory 300 in this example. The approach taken is to cache up to four kilobytes of an image in on-chip, 
high-speed texture memory 502. All primitives can be textured using the contents of texture memory 502. 

In order to use texture memory 502, video game program 1 08 must load a texture tile into the texture memory and 
then load the associated descriptor 804 into tile descriptor 576. The "load tile" command (see Appendix A) is used to 

is load a tile into texture memory 502, and a "set tile" and "set tile size" command are used to load corresponding tile 
descriptor blocks 804 into tile descriptor memory 576. In addition, a "Load Tlut" command (see Appendix A) can be 
used to load a color lookup table into texture memory 502 tor use by color indexed textures. 

Physically, texture memory 502 is organized in four banks, each comprising 256 16-bit wide words, each bank 
having a low half and a high half. This organization can be used to store 4-bit textures (twenty texels per row), 8-bit 

20 textures (ten texels per row), 16-bit textures (six texels per row), 16-bit YUV textures (twelve texels per row), and 32 -bit 
textures (six texels per row). In addition, texture unit 506 in this example supports a color -indexed texture mode in 
which the high half of texture memory 502 is used to store a color lookup table and the low half of the texture memory 
is used to store 4-brt or 8-bit color indexed textures. This organization is shown in Figure 25. In this Figure 25 example, 
a color indexed texture tile 580 is stored in a low half 502(L) of texture memory 502, and a corresponding color lookup 

2£ table 582 is stored in the upper half 502(H) of the texture memory. 

Figure 26 shows a more detailed depiction of a particular texture memory color indexed mode, in which the color 
lookup table 582 is divided into four palette banks 584 or tables, each having, for example, sixteen entries, each entry 
being 1 6-bits wide. The color lookup table may represent color in 16-bit RGBA format, or in 16-bit IA format. Since four 
texels are addressed simultaneously, there are four (usually identical) lookup tables 484 stored in the upper half of 

30 texture memory 502. As mentioned above, these lookup tables are loaded using the "load Tlut" command shown in 
Appendix A. 

Display processor 500 supports another color-indexed texture mode in which each texel in the lower half of texture 
memory 502 comprises eight bits-and therefore can directly access any one of the 256 locations in the upper half.502 
(H) of texture memory 502. Thus, 8-bit color-indexed textures do not use the palette number of the tile, since they 

3S address the whole 256-element lookup table directly. It is not necessary to use the entire upperhalf of texture memory 
502 for a lookup table when using 8-bit color-indexed textures. For example, rf less than eight of the bits of the 8-bit 
color-indexed texture tile is being used for color lookup, only a portion of color memory upper half 502(H) is required 
to store the lookup table-and the remainder of the upper half of the texture memory 502 might thus be used for storing 
a non-color-indexed texture such as a 4-bit I texture (see Figure 25). Similarly, even when color-indexed texture 580 

40 is stored in the lower half 502(L) of texture memory 502, it is possible to also store non-color-indexed textures in the 
lower half as well. Thus, color-indexed textures and non-color- indexed textures can be co-resident in texture memory 
502. 

The following texture formats and sizes are supported by texture memory 502 and texture coordinate unit 530: 
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Texture Format and Sizes 


Type 


4-bit 


8-bit 


16-bit 


32-bit 


RGBA 






X 


X 


YUV 






X 




Color Index 


X 


X 






Intensity Alpha (IA) 


X 


X 


X 




Intensity (I) 


X 


X 
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In this example, texture unit 506 will, unless explicitly told otherwise change a tile descriptor 804 or a texture tile 
802 immediately upon loading-even if it is still being used for texture mapping of a previous primitive. Texture loads 
after primitive rendering should be preceded by a "sync load" command and tile descriptor attribute changes should 
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be preceded by a "sync tile" command to ensure that the texture tile and tile descriptor state of texture unit 506 does 
not change before the last primitive is completely finished processing (see Appendix A for example formats and func- 
tions of these commands). 

As mentioned above in connection with the signal processor 400, two special commands ("texture rectangle" and 
"texture rectangle flip") can be used to map a texture onto a rectangle primitive (see Appendix A). It is possible to use 
the "texture rectangle" command to copy an image from texture memory 502 into frame buffer 118, for example. See 
Appendix A. 

COLOR COMBINER 

Referring once again to Figure 20, color combiner 508 combines texels outputted by texture unit 506 with stepped 
RGBA pixel values outputted by RGBAZ steppers 520. Color combiner 508 can take two color values from many 
sources and linearly interpolate between them. The color combiner 508 performs the equation: 

newcolor = (A-B) * C + D 

Here, A, B, C and D can come from many different sources (note that if D = B, then color combiner 508 performs simple 
linear interpolation). 

Figure 27 shows possible input selection of a general purpose linear interpolator color combiner 508 for RGB and 
Alpha color combination in this example. As can be seen in Figure 27, only some of the inputs in the lefthand column 
come from texture unit 506 or RBGAZ steppers 520. The rest of the inputs are derived from color combiner 508 internal 
state that can be programmed by sending commands to display processor 500. As discussed above, the "combined 
color" and 'combined Alpha* 1 values provided to color combiner 508 are obtained from the RGBAZ steppers 520, and 
the texel color and texture Alpha are obtained from texture unit 506 (two texel colors and corresponding Alpha values 
are shown since in two-cycle-per-pixel mode two texels will be provided by texture unit 506 for purposes of mipmapping 
for example). Additionally, the level of detail fractional input is obtained from Figure 24 block 574, and the primitive 
level of detail value along with the primitive color and primitive Alpha value may be obtained from a "set primitive color" 
command sent to display processor 500 (see Appendix A) (the primitive color value/alpha/level of detail fraction value 
can be used to set a constant polygon face color) Similarly, a shade color and associated Alpha value may be obtained 
from a "shade coefficient" command (see Appendix A) t and an environment color and associated Alpha value may be 
obtained from a "set environment color" command (see Appendix A) (the environment color/alpha value described 
above can be used to represent the ambient color of the environment). Two kinds of 'set key" commands (one for 
green/blue, the other for red) are used for green/blue color keying and red color keying respectively-these supplying 
the appropriate key:center and key:scale inputs to color combiner 508 (see Appendix A). Both the primitive and envi- 
ronment values are programmable and thus can be used as general linear interpolation sources. 

Convert K4 and K5 Inputs to color combiner 508 are specified in this example by the "set convert" command (see 
Appendix A) that adjust red color coordinates after conversion of texel values from YU V to RGB format (the remainder 
of the conversion process responsive to this set convert command being performed within texture filter unit 532). 

Figure 28 shows a portion of color combiner 508 used for combining the alpha values shown as inputs in Figure 
27. For both the RGB color combine in alpha color combine operations performed by color combiner 508, there are 
two modes one for each of the two possible pipeline modes one cycle-per-pixel, and two cycles-per-pixel). In the two- 
cycle mode, color combiner 508 can perform two linear interpolation arithmetic computations. Typically, the second 
cycle is used to perform texture and shading cofor modulation (i.e., the operations color combiner 508 are typically 
used for exclusively in the one-cycle mode), and the first cycle can be used for another linear interpolation calculation 
(e.g., level of detail interpolation between two bi-linear filtered texels from two mipmap tiles). Color combiner 508 also 
performs the "alpha fix-up" operation shown in Figure 29 in this example (see "set key GB" command in Appendix A). 

Bl nder 

As discussed above, blender 510 takes the combined pixel value provided by color combiner 508 can blends them 
against the frame buffer 118 pixels. Transparency is accomplished by blending against the frame buffer color pixels. 
Polygon edge antialias is performed, in part, by blender 510 using conditional color blending based on depth (z) range. 
The blonder 510 can also perform fog operations in two-cycle mode. 

Blender 510 can perform different conditional color-blending and z buffer updating, and therefore can handle all 
of the various types of surfaces shown in Figure 30 (i.e., opaque surfaces, decal surfaces, transparent surfaces, and 
interpenetrating surfaces). 

An important feature of blender 510 is its participation in the antialias process Blender 510 conditionally blends 
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or writes pixels into frame buffer USA based on depth range (see Figure 33 which shows example z buffer formats 
including a M dz M depth-range field) See U S Patent Application Serial No. 08/562,283 of Akeley et al. entitled "System 
and Method For Merging Pixel Fragments Based On Depth Range Values", filed on November 22, 1996. 

In this example, video interface 210 applies a spatial filter at frame buffer read-out time to account for surrounding 
$ background colors to produce antialias silhouette edges. The antialiasing scheme requires ordered rendering sorted 
by surface or line types. Here is the rendering order and surface/line types for z buffer antialiasing mode: 

1. All opaque surfaces are rendered. 

2. All opaque deca! surfaces are rendered. 

10 3. All opaque interpenetrating surfaces are rendered. 

4. All of the translucent surface and lines are rendered last. 

These can be rendered in any order, but proper depth order gives proper transparency. 

The mode blender 51 0 is controlled, in part by the groups of coefficients specified in the triangle command defining 
is the primitive (see Appendix A). Thus, a primitive can be rendered in a z buffered mode or non-z buffered mode as 
specified by the triangle command. In addition, the "set other modes" command (see Appendix A) specifies blend mode 
words for cycle 0 and cycle 1 in addition to specifying "blend masks" and enabling/disabling antialiasing. 

Blender 510 has two internal color registers: fog color and blend color These values are programmable using the 
"set fog color* and "set blend color" commands, respectivelyfsee Appendix A), These values can be used for geometry 
20 with constant fog or transparency. 

Blender 510 can compare the incoming pixel alpha value with a programmable alpha source to conditionally update 
frame buffer 118A. This feature can allow complex, outlined, billboard type objects, for example. Besides thresholding 
against a value, blender 510 in this example can also compare against a dithered value to give a randomized particle 
effect. See "set other modes'* command (Appendix A). Blender 510 can also perform fog operations, either in 1-cycle 
25 or 2-cyclo mode, Blender 510 uses the stepped z value as a fog coefficient for fog and pipeline color blending. 

Figure 31 shows an example of the overall operations performed by blender 510 in this example. In this particular 
example; blender 510 can be operated in a mode in which a coverage value produced by coverage evaluator 524 can 
be used to specify the amount of blending. Coverage evaluator 524 compares the coverage value of the current pixel 
(provided by edge walker 504) to stored coverage value within frame buffer 118A. As shown in Figure 32 (a depiction 
30 of the format of the color information stored for each pixel within color frame buffer 118A), the color of a pixel is repre- 
sented by 5-bits each of red, green, and blue data and by a 3-bit "coverage" value. This 'coverage" value can be used 
as-ts, or multiplied by an alpha value for use as pixel alpha and/or coverage (see "set other modes" command in 
Appendix A). The "coverage" value nominally specifies how much of a pixel is covered by a particular surface. Thus, 
the coverage value outpurted by edge walker 504 will be 1 for pixels lying entirely within the interior of a primitive, and 
35 some value less than 1 for pixels on the edge of the primitive. In this example, blender 510 uses the coverage value 
for antialiasing. At the time blender 510 blends a primitive edge, it does not know whether the primitive edge is internal 
to an object formed from multiple primitives or whether the edge is at the outer edge of a represented object. To solve 
this problem in this example, final blending of opaque edge values is postponed until display time, when the video 
interface 210 reads out frame buffer 118A for display purposes. Video interface 210 uses this coverage value to inter- 
40 polate between the pixel color and the colors of neighboring pixels in the frame buffer 118A. In order to accomplish 
this antialiasing at display time, blender 510 must maintain the coverage value for each pixel within frame buffer 118a, 
thereby allowing video interface 210 to later determine whether a particular pixel is a silhouette edge or an internal 
edge of a multi-polygon object. 

45 Memory Interface 512 and Z Buffering 

Memory interface 512 provides an interface between display processor 500 and main memory 300. Memory in- 
terface 512 is primarily used during normal display processor 500 operations to access the color frame buffer 118a 
and the Z buffer 118b, Color frame buffer 118a stores a color value for each pixel on color television screen 60. The 

50 pixel format is shown in Figure 32. Z buffer 118b stores a depth value and a depth range value for each color pixel 
value stored in color frame buffer 118a. An example format for z buffer values is shown in Figure 33. The Z buffer 118b 
is used primarily by blender 510 to determine whether a newly rendered primitive is in front of or behind a previously 
rendered primitive (thereby providing hidden surface removal). The "DZ" depth range value shown in Figure 33 may 
b used to help ascertain whether adjacent texels are part of the same object surface. 

55 Memory interface 512 can write to main memory 300, read from main memory, or read, modify and write (RMW) 

locations in the main memory For RMW operations, memory interface 51 2, in this example, pre-fetches a row of pixels 
from frame buffer 118a as soon as edge walker 504 determines the x, y coordinates of the span Memory interface 
512 includes an internal "span buffer* 512a used to store this span or row of pixels Memory interface 512 provides 
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the appropriate pre-f etched pixel value from span buffer 510a to blender 510 at the appropriate time-thus minimizing 
the number of accesses to main memory 300 Span buffer 512a is also used to temporarily store blended (modified) 
pixel values so that display processor 500 need not access main memory 300 each time a new pixel value is blended. 
In general, memory interface 512 writes the entire span worth of pixels into main memory 300 as a block all at once. 

Memory interface 512 has enough on-chip RAM to hold several span buffers. This can cause problems, however, 
if two spans in sequence happen to overlap the same screen area. A parameter "atomic space" in the "Set Other 
Modes" command (see Appendix A) forces memory interface 512 to write one primitive to frame buffer 118a before 
starting the next prim»tive--thereby avoiding this potential problem by adding no cycles after the last span of a primitive 
is rendered. 

Depth comparator 526 operates in conjunction with z buffer 118b to remove hidden surfaces and to insure the 
transparent values are blended properly Depth comparator 526 compares the z or depth value of the current pixel with 
the z value currently residing in z buffer 1 1 8a for that screen location. At the beginning of the rendering of a new frame, 
all locations in z buffer 118b are preferably initialized to maximum distance from the viewer (thus, any object will be 
open "in front of" this initialized value). Generally, each time display processor 500 (is to blend a new pixel into frame 
buffer 118a), depth comparator 526 compares the depth of the current pixel with the depth residing in that location of 
z buffer 118b. If the old z buffer value indicates that the previously written pixel is "doser" to the viewer than is the new 
pixel, the new pixel is discarded (at least for opaque values) and is not written into the frame buffer-thus accomplishing 
hidden surface removal If the new pixel is "closer" to the old pixel as indicated by depth comparator 526, then the new 
pixel value (at least for opaque pixels) may replace the old pixel value in frame buffer 118a--and the corresponding 
value in z buffer 11 8b is similarly updated with the z location of the new pixel (see Figure 33A). Transparency blending 
may be accomplished by blending without updating the z buffer value-but nevertheless reading it first and not blending 
if the transparent pixel is "behind" an opaque pixel. 

Video Interface 210 

Video interface 210 roads the data out of frame buffer 1 1 8 and generates the composite, S video RGB video output 
signals. In this example, video interface 210 also performs anti-aliasing operations, and may also perform filtering to 
remove truncation caused by the introduction of dithering noise. 

Video interface 210 in this example works in either NTSC or PAL mode, and can display 15-bit or a 24-bit color 
pixels with or without filtering at both high and low resolutions. The video interface 210 can also scale up a smaller 
image to fill the screen. The video interface 210 provides 28 different video modes plus additional special features. 

Video interface 210 reads color frame buffer 118a in synchronization with the electron beam scanning the color 
television screen 60, and provides RGB values for each pixel in digital form to video D AC 144 for conversion into analog 
video levels in this example. Video interface 210 performs a blending function for opacity values based on coverage 
(thereby providing an antialiasing function), and also performs a bacMiltering operation to remove some of the noise 
introduced by screen-based dithering. 

Figure 34 is a block diagram of the architecture of video interface 21 0 In this example, video interface 21 0 includes 
the DMA controller 900, a buffer 902, control logic 904, anti-aliasing filters 906a, 906b, error correction blocks 908a, 
908b, vertical interpolator (filter) 910, horizontal interpolator (filter) 912, "random" function generator 914, gamma block 
916, and bus driver 918. 

DMA controller 900 is connected coprocessor bus 214. DMA controller 900 reads color frame buffer 1 1 8a beginning 
at an "origin - address in the main memory specified by main process 100 (see Figure 35B). DMA controller 900 se- 
quentially reads the pixel color and coverage values (see Figure 32) from frame buffer 118a in synchronism with the 
line scanning operations of television 58. The pixel values read by DMA controller 900 are processed by the remainder 
of video interface 210 and are outputted to video DAC 144 for conversion into an analog composite video signal NTSC 
or PAL format in this example. 

DMA controller 900 in this example provides the color/coverage values it has read from main memory frame buffer 
118a, to a RAM buffer 902 for temporary storage. In this example, buffer 902 does not store the pixel color values 
corresponding to an entire line of television video. Instead, buffer 902 stores a plurality of blocks of pixel data, each 
block corresponding to a portion of a line of video. Buffer 902 provides "double buffering," i.e., it has sufficient buffers 
to make some line portions available to filters 906 while other buffers are being written by DMA controller 900. 

In this example, DMA controller 900 accesses, and stores into buffers 902, several of the pixel data corresponding 
to several horizontally-aligned portions of the video lines to be displayed on television screen 60. Looking at Figure 
34A, frame buffer 1 1 8a is shown -- for purposes of illustration - as being organized in a row/column order corresponding 
to pixels on the television screen (it will be understood that the frame buffer as stored in main memory 300 may actually 
be stored as a long sequential list of pixel color/coverage values). In this example, DMA controller 900 reads out a 
block of pixel values corresponding to a particular segment of the current line n of video to be displayed (top shaded 
block in Figure 34A frame buffer 1 1 8a). and also reads out the pixel values corresponding to a horizontally -aligned (on 
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the television screen) line segment of a "next" video line n+1 (i.e., the part of the pixel data representing the part of 
the "next" line just beneath the line n). In this particular example, also reads a further block of pixel values from the 
frame buffer corresponding to the horizontally-aligned line segment of video line n+2 

Each of these blocks of pixel values is stored in buffer 902. Filters 906a, 906b perform a filtering/anti-aliasing 
5 operation based on coverage value to interpolate the current line's pixel values with neighboring pixel values (i.e., pixel 
values that are adjacent with respect to the displayed position on color television screen 60). The anti-aliasing filtering 
operations performed by filters 906a, 906b are as described in U.S. Patent Application Serial No. 08/539,956 of Van 
Hook et al, entitled "Antialiasing of Silhouette Edges", filed on October 6, 1996. Briefly, a three-scan-line high neigh- 
borhood is color weighted by coverage value in a blending process performed by filter 906. This filtering operation 
to results in smoother, less jagged lines at surface edges by using the pixel coverage value retained in frame buffer 1 1 8a 
(which coverage value indicates what percentage of the pixel is covered by a polygon) to adjust the contribution of that 
pixel value relative to the contributions of neighboring pixel values in a blending process to produce the current pixel 
value. "Divot" error correction blocks 908a, 908b correct the outputs of anti-alias filters 906a, 906b for slight artifacts 
introduced by the anti-aliasing process. In particular, for any pixels on or adjacent to a silhouette edge, the error cor- 
'5 rection blocks 908 take the median of three adjacent pixels as the color to be displayed in place of the center pixel. 
This error correction can be enabled or disabled under software control (see Figure 35A), and a video game programmer 
may wish to disable the error correction since it interacts poorly with deca) line rendering modes. 

Anti-aliasing filters 906a, 906b operate in parallel in this example to produce pixel data blocks corresponding to 
horizontally aligned portions of two successive lines (line n, line n+1) of the image represented by frame buffer 118a. 
20 These pixel values are provided to vertical interpolator 910, which performs a linear interpolation between the two 
image lines to produce an image portion of a single scan line (see Figure 34A). Interpolator 910 interpolates between 
successive scan lines in order to reduce flicker in interlaced displays. For example, interpolator 910 can add in a 
contribution from a previous or next successive horizontally-aligned scan line portion to make transitions between 
successive video scan lines less noticeable -- thereby reducing flicker. 
25 Additionally, interpolator 910 in this example can perform a vertical scaling function that allows the number of lines 

displayed on television screen 60 to be different from the number of lines represented by the frame buffer 118a pixel 
information. In this example, filter 906 scales in the vertical dimension by resampling the pixel data for successive lines 
of image represented by frame buffer 11 8a — thereby allowing television screen 60 to have a different number of lines. 
This scaling operation (which also accommodates offsetting) is controlled by the values within the video interface Y 
30 scale register (see Figure 35N) The ability to scale the television image relative to the digital image size of frame buffer 
118a provides additional flexibility. For example, the scaling ability makes it possible for signal processor 400 and 
display processor 500 to generate a smaller digital image representation in frame buffer 1 1 8 - and yet allow that smaller 
image to fill the entire television screen 60. Since a smaller frame buffer 1 1 8 requires less time to rasterize (i.e. , display 
processor 500 needs to handle fewer spans and fewer pixels per span for a given polygon) and less memory to store, 
35 the scaling ability can provide increased performance - albeit at the cost of a lower resolution image. 

The output of vertical filter 910 in this example is a block of pixel data representing the pixel values for a portion 
of the video line to be displayed. As shown in Figure 34A, this block of pixel values is provided to horizontal interpolator 
912. Horizontal interpolator 912 provides a linear interpolation between neighboring pixel values in order to resample 
the pixels based on a horizontal scaling factor stored in the X scale register (see Figure 35M). Horizontal interpolator 
40 112 thus provides a horizontal scaling ability, c g , to convert a smaller number of frame buffer values into a larger 
number of screen pixels along a horizontal line. 

The output of horizontal interpolator 912 is provided to a Gamma correction circuit 916 that converts linear RGB 
intensity into non-linear intensity values suitable for composite video generation for the gamma non-linearity of TV 
monitors. This amounts to taking a square root of the linear color space. The TV monitor effectively raises these color 
<5 values to a power ol 2.2 or 2 4. A "random" function block 914 introduces additional bits of resolution to each of the 
R, G and B color values in order to "de-dither" (i. e , to compensate for the bit truncation performed by display processor 
dithering block 522). As shown in Figure 32, one example frame buffer 118 color pixel format in this example provides 
only five bits of resolution of each R, G and B to conserve storage space within main memory 300. Display processor 
dithering block 522 may truncate 8-bit RGB color values provided by blender 510 to provide the compressed repre- 
ss sentation shown in Figure 32. Block 914 can reverse this truncation process to decompress the RGB values to provide 
256 different display color levels for each R, G and B. See U.S. Application Serial No. 08/561 ,584, entitled "Restoration 
Filter For Truncated Pixels" of Carrol Philip Gossett, filed on November 21, 1996 in the name of Gossett, entitled 
"Restoration Filter For Truncated Pixels," (Atty DM. No SKGF1 452. 1690000). This dither filter operation can be turned 
on and off under software control (see Figure 3SA). 
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Example Video Interface Registers 

There are sixteen control registers for the video interface 210 which control all its functions including sync gener- 
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ation, video rescaling, and anti-aliasing. Figures 35A-35P show the various registers within video interface 210 the can 
be accessed by main processor 100. 

Figure 35a shows the video interface control register 952. Main processor 100 can write the following values into 
this register 952 to control the operation of video interface 210: 

• Type field 952a specifies pixel data size as blank (no data, no sync), the format shown in Figure 32 (5bits each of 
RGB and a 3-bit coverage value), or 8/8/8/8 (32-bit color value and 8 bits of coverage); 

• Gamma dither enable field 952b turns an and off the addition of some random noise to the least significant bits of 
the video out before the final quantization to 7 bits to eliminate Mach banding artifacts; 

• Gamma enable field 952c turns on and off gamma correction; 

• Divot enable field 952d turns on and off the divot error correction discussed above; 

• video bus clock enable field 952e turns an internal clock on or off; 

• Interlace field 952f turns interlacing on and off; 

• Test mode field 952g; 

• anti-alias mode on/off field 952 h; 

• diagnostic field 952i, 

• pixel advance field 952j, and 

• dither filter enable field 952k. 

Figure 35B shows the video interface origin register 954 used to specify the beginning main memory 300 address 
of frame buffer 118a for read out. In this example, main processor 100 needs to explicitly set this register 954 each 
time video interface 210 is to read from a new area in main memory 300 (e.g., to read the other half of double buffered 
frame buffer 118). 

Figure 35c shows the video interface line width register 956, which can be set to specify the number of pixels in 
each horizontal line. Figure 35d shows the video interface vertical interrupt register 958, which main processor 100 
can set with a particular vertical line number so that coprocessor 200 will interrupt the main processor once per frame 
at the specified vertical line or half line. Figure 35e shows the video interface current line register 960, which specifies 
the current vertical line when read from by the main processor 100 and clears the vertical line interrupt when written 
to by the main processor 

The registers 962-972 shown in Figures 35G-35L are used by main processor to specify detailed composite video 
timing parameters. For example: 

• Figure 35F shows the vertical interface timing register 962 which main processor 100 can write to to specify hor- 
izontal sync pulse width, color burst width, vertical sync pulse width, and color burst start timing. 

• Figure 35G shows the video interface vertical sync register 964 that main processor 100 may write to specify the 
number of vertical half-lines per field. 

• Figure 35H shows the video interface horizontal sync register 965 which main processor 1 00 can write to specify 
the total duration of a line and a horizontal "leap pattern*' for PAL. 

• Figure 35! shows the video interface h sync leap register 966 Specifying two alternate h sync leap parameters for 
PAL. 

• The video interface horizontal video register and vertical video register 968, 970 shown in Figures 35j, 35k, re- 
spectively, are used to specify horizontal and vertical video start and end times relative to hsync and vsync. 

• The vertical interfaced vertical burst register 972 shown in Figure 35L specifies color burst start and end timing. 

The timing parameters programmable into registers 962-972 can be used to provide compatibility with different 
kinds of television sets 58. For example, most television sets 58 in the United States use a composite video format 
known as NTSC, whereas most European television sets use a composite video format known as PAL. These formats 
differ in terms of their detailed timing parameters (e.g., vertical blanking integral width and location within the signal 
pattern, horizontal synchronization puis© width, color burst signal pulse width, etc.). Because registers 962-972 control 
these composite video timing parameters and are programmable by software executing on main processor 100, a 
programmer of video game 108 can make her program NTSC compatible, PAL compatible, or both (as selected by a 
user) by including appropriate instructions within the video game program that write appropriate values to registers 
962-972. Thus, in this example, coprocessor 200 is compatible NTSC-standard television sets 56, PAL standard com- 



46 



BP 0 778 536 A2 



patible television sets - and even with video formats other than these within a range as specified by the contents of 
registers 962-972 

Vertical interface x and y scale registers 974, 976 (see Figure 35m, 35n, respectively) specify x and y scale up 
and subpixel offset parameters for horizontal and vertical scaling, as discussed above. Figures 350 and 35p show 
5 video interface test data and address registers 978, 980 for diagnostic purposes. 

Memory Controller/Interface 212 

As explained above, coprocessor memory interface 212 interfaces main memory 300 with coprocessor internal 
10 bus 21 4. In this example, main memory 300 is accessed over a 9-bit wide bus, and one of the tasks memory interface 
212 is responsible for is to buffer successive 9-bit words so they can be more conveniently handled by coprocessor 
200. Figure 36 is an example diagram showing the overall architecture of memory controller/interface 212 

In this example, memory interface/controller 212 includes a pair of registers/buffers 1000, 1002, a control block 
1004, and a RAM controller block 2 12b. RAM controller block 212b comprise RAM control circuits designed and spec- 
ks ified by Rambus Inc. for controlling main memory 300. Registers 1000, 1002 are used to latch outgoing and incoming 
data, respectively. Control block 1004 controls the operation of memory interface 212. 

Example Memory Controller/Interface Registers 

^0 Figures 37A-37H show example control registers used by main processor 100 to control memory interface 212. 

Figure 37A shows a read/write mode register specifying operating mode and whether transmit or receive is active 
(1052). Figure 37B shows a configuration register 1054 that specifies current control input and current control enable. 
Figure 37C represents a current mode register 1056 that is Write only, with any Writes to this register updating the 
current control register Figure 37D shows a select register 1058 used to select receive or transmit. Figure 37E shows 

2 s a latoncy register 1060 used to specify DMA latency/overlap. Figure 37F shows a refresh register 1062 that specifies 
clean and dirty refresh delay, indicates the current refresh bank, indicates whether refresh is enabled, indicates whether 
refresh is optimized, and includes a field specifying refresh multi-bank device. Figure 37G shows an error register 
which in a read mode indicates NACK, ACK and over-range errors, and when written to by main processor 100 clears 
all error bits. Figure 37H shows a bank status register 1066 which, when read from indicates valid and dirty bits of the 

30 current bank, and when written to clears valid and sets dirty bits of the current bank 

CPU INTERFACE 

Figure 38 shows a block diagram of coprocessor CPU interface 202 in this example. CPU interface 202 comprises 
3S a FIFO buffer 1 1 02 and a control block 11 04. FIFO buffer 1 1 02 provides bidirectional buffering between the CPU SySAD 
multiplexed address/data bus 102a and the coprocessor multiplexed address/data bus 214D. Control block 1104 re- 
ceives addresses asserted by the main processor 100 and places them onto the coprocessor address bus 21 4C. 
Control block 1104 also receives interrupt signals from the other parts of coprocessor 200, and receives command 
control signals from the main processor 100 via SysCMD bus 102b. 

40 

Example CPU Interface Registers 

Figures 39A-39D show the registers contained within CPU interface 303 in this example. Figure 39 shows a CPU 
interface status/control register 1152 that controls coprocessor 200 when mam processor 100 writes to the register 

45 and indicates overall coprocessor status when the main processor reads from the register. Main processor 100 can 
write to register n 52 to specify initialization code length, set or clear initialization mode, set or clear internal coprocessor 
bus test mode, clear display processor 400 interrupt, and set or clear main memory register mode. When main processor 
100 reads from this register 1152, it can determine initialization code length, initialization mode, internal coprocessor 
bus test mode, and whether the coprocessor is operating in the main memory register mode. 

so Figure 39b shows a version register 1 1 54 that main processor 1 00 can read from to determine version information 

pertaining to various components within coprocessor 200. 

Figure 39c shows an interrupt register 1156 that main processor 100 can read from to determine the source of an 
interrupt it has received from coprocessor 200. In this oxamp!e : a single line connects between coprocessor 200 and 
main processor 100 is used for-interrupt purposes. Upon receiving a coprocessor interrupt, main processor 100 can 

55 read interrupt register (which contains an interrupt vector) to ascertain what component Within coprocessor 200 (i.e., 
signal processor 400, serial interface 204, audio interface 208 : video interface 210, parallel interface 206, or display 
processor 500) cause the interrupt. Figures 39d shows an interrupt mask register 1158 which main processor 100 can 
write to to set or clear an interrupt mask for any of the interrupts specified in interrupt register 1156, and may read to 
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determine interrupts are masked and which are not. 
AUDIO INTERFACE 

s Figure 40 shows an overall block diagram architecture of audio interlace 208 in this example. Audio interface 208 

includes DMA logic 1200, a state machine/controller 1202, an audio clock generator 1204, audio data buffers 1206 
and a serializer 1208. In this example, DMA logic 1200 fetches digital audio sample data from audio buffer 114 within 
main memory 300. DMA logic 1200 writes this audio sample data, 8 bytes at a time into audio data buffers 1206. There 
are multiple audio data buffers 1206 arranged in a FIFO so that DMA logic 1200 can be prefetching some audio sample 
JO data while serializer 1208 serializes other, previously fetched-and-buffered audio sample data Thus, buffers 1206 
store enough data to supply serializer 1208 between block reads by DMA logic 1 200. Since the output rate of serializer 
1208 is relatively slow (e.g., on the order of 4 bytes at 50kBz, a single 64-bit buffer 1206b can store enough digitized 
audio samples to last a relatively long time in terms of real time audio output. 

As discussed above, serializer converts the parallel contents of audio buffers 1206 into serial format, and places 
is the resulting serial digital audio data stream onto bus 209 for communication to audio DAC 140. Digital audio bus 209 
in this example includes a single serial data line 209a multiplexed between left channel data and right channel data. 
In this example, serializer 1208 outputs a 16-bit long word for each stereo channel, alternating between the channels. 
The output bit rate of serializer 1 208 is specified by audio clock generator 1204. Audio clock generator 1204 produces 
an audio clock output on 209b to synchronize audio DAC 140 to the serializer 1208 output bit rate, and produces an 
20 audio L7R clock on line 209c specifying whether the current serializer output 1 208 is for the left or right stereo channel. 

Figure 40 shows a number of registers and counters used to control audio interface 208. DMA controllers 1200 
receives a starting main memory address from an address register 1210. Main processor 100 writes to this address 
register 1210 (see Figure 41 A) to point audio interface 208 to the locations in main memory 300 providing the audio 
buffer 11 4 for the current audio to be played. A counter 1212 increments this address for each fetch by DMA controller 
25 1 200-thereby sequencing the DMA controller through the entire audio buffer 114. Main process 100 writes the length 
of audio buffer 114 into a transfer length register 1214 (see Figure 41 B). An additional counter 1216 associated with 
length register 1214 sequences state machine 1202 through an appropriate number of control states corresponding 
to the length of audio buffer 114. State machine 1202 generates control signals that synchronize the operations of the 
other parts of audio interface 208 relative to one another In this example, main processor 100 can enable audio interface 
30 208 to begin fetching data from the main memory 300 by writing to a DMA enable register location 1217 (not shown 
in Figure 40; see Figure 41 C). Main processor 100 may also detennine the state of audio interface 200 by reading an 
audio interface status register 1218 (not shown in Figure 40; see Figure 41 D). In this example, state machine 1202 
generates a main processor interrupt when it reaches the end of audio buffer 114 as specified by length register 1214, 
and the main processor 100 can clear this interrupt by writing to the status register 1218 location (see Figure 41 D). 
3S , In this example, main processor 100 may also control the rate of the clocking signals generated by audio clock 
generator 1204. Main processor 100 can program these rates by writing to audio rate registers 1218, 1220 (see Figures 
41E, 41 F). A counter 1222 may provide a programmable dividing function based on the rate values main processor 
100 as written into audio rate registers 1218, 1220 

40 SERIAL INTERFACE 

Figure 42 shows an overall high level block diagram of serial interface 204 in this example. 
In this example, serial interface 204 moves blocks of data between coprocessor 200 and serial peripheral interface 
138. Serial interface 204 can either read a 64-byte data block from serial peripheral interface 1 38 and transfer it to a 

45 specified location in main memory 300 or alternatively, it can read a 64-byte block of data stored in the main memory 
and transfer it serially to the serial peripheral interface. In this example, serial interface 204 comprises primarily direct 
memory access logic 1300, control logic 1302, and a parallel/serial converter 1304. Parallel/serial converter 1304 in 
this example comprises a shift register that converts serial data sent by serial peripheral interface 138 over a read 
data/acknowledge bus 205a into parallel data for application to latch 1308. The contents of latch 1308 is then applied 

so to coprocessor data bu= 21 4d for writing into main memory 300. Alternatively, in a parallel-to-serial conversion mode, 
shift register 1304 receives parallel data from the coprocessor data bus 21 4d via a latch 1310 and converts that data 
into serial for transmission to serial peripheral interface 138 via a command and write data bus 205b. 

Main processor 100 specifies the address within main memory 300 that serial interface 204 is to read from or write 
to, by writing this address into an address register 1312 (see Figure 43A). Address register 1312 contents specify the 

55 main memory address to be loaded in DMA address counter 1 314. Part of the contents of address register 1 31 2 may 
also be used to specify "address" information within serial peripheral interface 138. Such serial peripheral interface 
"address" information is loaded into a latch 1 316, the contents of which are provided to shift register 1304 for trans- 
mission to the serial peripheral interface This serial peripheral interface "address" information may be used, for ex- 



48 



EP 0 778 536 A2 

ample, to specify a location within the serial peripheral interface 138 (i.e., a boot ROM location 158, a RAM buffer or 
a status register). 

In this example, serial interface 204 has the ability to place the shift register 1304 parallel output onto the coproc- 
essor address bus 214c via register 1308. a multiplexer 1318, and a latch 1320. 
5 As shown in Figures 43B, 43C, main processor 100 in this example specifies the direction of serial transfer by 

writing to a location 1 322 or 1 324. A write to location 1 322 causes serial interface 204 to read a 64-byte data block 
from the serial peripheral interface 138 and write it to the main memory 300 location specified by address register 
1 31 2. A write by main processor 1 00 to register location 1 324 causes serial interface 204 to read a 64-byte block of 
data from the main memory 300 location specified by address register 1 312, and to write the data in serial form to the 
10 serial peripheral interface 138. 

Figure 43D shows the serial interface status register 1326. Main processor 100 can read status register 1326 to 
determine the status of serial interface 204 (e g , whether the serial interface is busy with a DMA or I/O operation (fields 
1328 (1) 1328 (2), respectively); whether there has been a DMA error (field 1328 (3); or whether the serial interface 
has caused a main processor interrupt (field 1328 (4)). Serial interface 204 may generate a main processor interrupt 
*s each time it has completed a data transfer to/from serial peripheral interface 1 38. Main processor 100 can clear the 
serial interface interrupt by writing to register 1326. 

PARALLEL PERIPHERAL INTERFACE 

20 Figure 44 shows an example block diagram of parallel peripheral interface 206. In this example, parallel interface 

206 transfers blocks of data between main memory 300 and storage device 54 Although storage device 54 described 
above includes only a read-only memory 76 connected to parallel bus 104, system 50 can accommodate different 
configurations of peripherals for connection to connector 154. For example, two different types of peripheral devices 
(e.g., a ROM and a RAM) may be connected to peripheral connector 154. Peripheral interface 206 is designed to 

25 support communications between two different types of peripheral devices connected to tho same parallel bus 104 
without requiring any time-consuming reconfiguration between writes. 

Some such peripheral devices may be read-only (e.g., ROM 76), other such peripheral devices may be read/write 
(e.g., a random access memory or a modem), and still other such peripheral devices could be write only. Peripheral 
interface 206 supports bi-directional, parallel transfer over parallel bus 104 between connector 1 54 and main memory 

30 300. 

Parallel peripheral interface 206 in this example includes a DAM controller 1400. a control/register block 1402, 
and a register file 1 404. Register file 1 404 buffers blocks of data being transferred by peripheral interface 206 between 
a peripheral device connected to connector 154 and a block of storage locations within main memory 300. In this 
example, register file 1404 comprises a small RAM that stores 16 64-bit words. Register file 1404 operates as a FIFO, 

35 and is addressed by control/register block 1402. The output of register file 1404 is multiplexed into 16-bit portions by 
multiplexer 1406. These 16-bit-wide values are latched by a latch 1408 for application to the peripheral device con- 
nected to connector 1 54 via a multiplexed address/data bus 104ad. Data read from the peripheral device via the mul- 
tiplexed address/data bus 104ad is temporarily stored in a latch 1410 before being applied (via a multiplexer 1412 that 
also positions the 16-bit read value within an appropriate quarter of a 64-bit word) into register file 1404. Multiplexer 

40 141 2 also receives data from coprocessor data bus 2l4d via latch 1414, and can route this received data into register 
tile 1404 for storage The register file 1404 output can also be coupled to coprocessor data bus 21 4d via latch 1416. 
In this example, the register file 1 404 output may also be coupled to the coprocessor address bus 21 4c via a multiplexer 
141 8 and a latch 1420. 

Main processor 100 controls the parameters of a DAM transfer performed by peripheral interface 206 by writing 
45 parameters into control/register block 1402. For example, main processor 100 can write a starting main memory ad- 
dress into a DRAM address register 1422 (see Figure 45A) and can write a starting address space of a peripheral 
device connected to connector 1 54 by writing a peripheral bus address starting address into the peripheral bus register 
1424 (see Figure 45B). In this example, main processor 100 specifies the length and direction of transfer by writing to 
one of registers 1426, 1428 shown in Figures 45C, 45D, respectively. A write to read length register 1426 shown in 
so Figure 45C controls the peripheral interface 206 to transfer in one direction, whereas writing a length value into register 
1428 shown in Figure 45D causes the peripheral interface to transfer in the opposite direction. In this example, the 
main processor 100 can read the status of peripheral interface 206 by reading from a status register location 1 430(R) 
(See Figure 45B). This status register 1430(R) contains fields 1432 indicating DMA transfer in progress (field 1432 
(1)), I/O operation in process (field 1432 (#)). an error condition (field 1432 (3)). By writing to the same register 1430 
55 (W) location, main processor 100 can clear an interrupt peripheral interface 206 generates whan it has completed a 
requested transfer. Writing to status register location 1 430(W) also allows mam processor 1 00 to both clear and interrupt 
and abort a transfer in progress (see Figure 45A field 1434 (1)). 

Figures 45F, 45G, 45H, 451 show additional registers main processor 100 can write to in order to control timing 
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and other parameters of the peripheral interface bus 104. These registers permit main processor 100 to configure the 
bus 104 for particular types of peripheral devices-all under control of software within game program 108. !n this ex- 
ample, peripheral interface 44 supports duplicate sets of registers 1436, 1438, 1440 and 1442 shown in Figures 45F- 
45l-allowing different peripheral bus 1 04 protocols to be used for different peripheral devices connected simultaneously 
to the bus without requiring the main processor 100 to re-write the configuration registers each time it request access 
to a different device. In this example, one set of configuration registers 1436, 1438, 1440 and 1442 are used to configure 
the bus 104 protocol whenever the peripheral interface 206 accesses a "region 1" address space within the 16-bit 
peripheral address space, in the other set of register parameters are used whenever the peripheral interface accesses 
a "region 2' address space within the peripheral bus address range (see Figure 5D memory map). The configurations 
specified by these two sets of registers are invoked simply by main processor 1 00 writing to the appropriate region. 

The various ones of control registers shown in Figures 45A-4SI may, in this example, be located within the control/ 
register block 1 402 of Figure 44. The configuration values stored in registers 1 436, 1 438, 1 442 are used in this example 
to control the timing of the access control signals control/register block 1402 produces on bus control line 1404C. A 
latch 1434 is used to temporarily latch addresses on the co-processor address bus 214C for application to control/ 
register block 1402 (e.g., to select between the various registers). Control/register block 1402 in this example includes 
appropriate counters and the like to automatically increment DMA addresses. 

While the invention has been described in connection with what is presently considered to be the most practical 
and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, 
but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit 
and scope of the appended claims. 



APPENDIX A: Display Processor 500 Graphic Display Command Example 
Formats and associated functions of this example • 
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Claims 

1. An interactive video game system comprising; 
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an interactive user input device; 

a main processor coupled to the input device, the main processor having an address space, the main processor 
interactively selecting a point of view in response to inputs from the user input device; 
a coprocessor coupled to the main processor, the coprocessor providing a predetermined graphics feature 
5 set for interactively generating image data in response to the selected point of view by projecting polygons 

representing a three dimensional world onto a two dimensional viewing plane, the coprocessor including: 

a signal processor that is shared between at least graphics functions and audio processing functions, the 
signal processor including a scalar unit and a vector unit, the vector unit capable of performing plural 
10 calculations in parallel, the signal processor including a microcode store that stores microcode, the signal 

processor executing the microcode in the microcode store to perform the graphics and audio processing 
functions; 

a display processor comprising display pipeline hardware that alternatively provides a one-pixel-per-cycle 
mode and a two-pixel-per-cycle mode to minimize hardware while providing a rich feature set including 
is level-of-detail processing, the display pipeline hardware including a texture memory having first and sec- 

ond parts, the texture memory first part being capable of storing texture maps that are color indexed and 
texture maps that are not color indexed, the texture memory second part being capable of storing texture 
maps and/or color lookup tables for the color indexed texture maps, 
a video interface, 
20 an audio interface, 

a serial interface, and 
a parallel peripheral interface, 

wherein each of the signal processor, the display processor, the video interface, the audio interface, the 
serial interface and the parallel peripheral interface includes circuitry for accessing a main memory; 

the main memory being coupled to the coprocessor via a 9 bit wide bus, the main memory providing a common 
address space for the coprocessor and the main processor, the main memory storing at least the following 
data structures: 

instructions for execution by the main processor; 
a color frame buffer; 
a depth buffer; 
graphics microcode; 
audio processing microcode; 
at least one display list; 
at least one texture map; and 
at least one audio output buffer; 

a video signal generating circuit coupled to the coprocessor video interface, the video signal generating 
circuit generating a video signal for display on a color television set; 

a removable storage device including a housing, a security chip, a read only memory and at least one further 
memory device, the coprocessor including an arrangement that maps the read only memory and the further 
memory device into the main processor address space, the read only memory initially storing the graphics 
and audio processing microcode; and 

a connector that connects the coprocessor to the removable storage device; and 

a serial peripheral interface circuit coupled to the coprocessor serial interface, the serial peripheral interface 
circuit including a processor that performs serial interface functions and security functions and further includes 
a boot ROM that provides main processor initial program load instructions, the serial interface circuit processor 
being coupled to the removable storage device security chip through the connector 

An interactive real time graphics display system comprising: at least one user input device; 

a main random access memory providing a common address space; 

a main processor coupled to address the main memory and also coupled to the user input device, the main 
55 processor storing instructions in and executing instructions from the mam memory in real time response to 

inputs received from the user input device, the main processor storing at (east display list graphics commands 
and play list audio commands in the main memory; 

a signal processor coupled to address the main memory, the signal processor fetching and executing microc- 
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ode stored in the main memory, the signal processor reading the display list graphics commands and play list 
audio commands from the main memory, the signal processor generating audio sample data in response to 
the play list audio commands and generating graphics display commands in respons to the display list, the 
signal processor storing the sample data in an audio output buffer allocated within the main memory; 
a display processor coupled to address the main memory, the display processor generating image data based 
at least in part on at least one texture map and other graphics data stored in the main memory, the display 
processor producing the image data in response to the graphics display commands, the display processor 
storing the image data in a color image frame buffer within the main memory; 

a video interface coupled to address the main memory, the video interface reading the color image frame 
buffer in synchronism with display raster scan; and 

an audio interface coupled to address the main memory, the audio interface reading the audio output buffer 
in synchronism with real time sound generation. 

A method of operating a graphics display system of the type including a main processor, a coprocessor coupled 
to the main processor, a main random access memory coupled to the coprocessor and addressable by both the 
main processor and the coprocessor, and a video signal generating arrangement that produces a video signal for 
display, the method including the following steps: 

(a) storing main processor code into the main memory; 

(b) executing, with the main processor, the main processor code stored by the storing step, said executing 
step including storing coprocessor code, a task list, at least one texture map and a color lookup table into the 
main memory; 

(c) fetching the task list from main memory; 

(d) processing the task list with the coprocessor in accordance at least in part with the coprocessor code stored 
by step (b), the processing step including performing the following stops: 

(1) loading the texture map and the color lookup table from the main memory into an on-chip texture 
memory; 

(2) performing at least one 3D geometric transformation on a set of vertices using a scalar unit and a 
vector unit including performing multiple calculations in parallel with the vector unit; 

(3) generating a triangle command based on the 3D geometric transformation; 

(4) generating a pixel value in response to the triangle command; 

(5) accessing the texture memory twice to provide color indexed texels based on the triangle command; 

(6) combining the texels with the generated pixel value to generate a combined pixel value; 

(7) accessing pixel values in a frame buffer stored in the main memory; 

(8) blending the combined pixel value with at least one pixel value stored in the frame buffer; 

(9) conditionally writing the combined pixel value into the frame buffer based on a comparison using a 
depth buffer stored in the main memory; 

(10) using said scalar and vector units to generate output audio samples including performing multiple 
calculations in parallel with the vector unit; and 

(11) storing the output audio samples into the main memory; 

(e) reading the frame buffer in real time synchronism with color television set line* scanning and converting the 
frame buffer contents to a composite video signal; and 

(f) reading the stored output audio samples in real time and converting the stored audio samples into stereo 
sound. 

A process for generating at least one display mode control command for processing by a 3D graphics system, the 
process including the step of generating at least one set mode command having: 

a command identifier field including a six-bit binary value of 101111 , and 
at least one of the following mode control fields: 

(k) an atomic primitive mode field that specifies whether to force writing a primitive to a frame buffer before 
reading a following primitive, 

(i) a cycle type mode field that selects a display pipeline cycle control mode, 

(h) a perspective texture enable mode field that selectively enables perspective texture correction, 

(g) a texture detail mode field that selectively enables texture detail processing, 
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(f) a texture sharpen enable mode field that selectively enables texture sharpening, 

(e) a texture detail enable mode field that selectively enables texture level-of-detail processing, 

(d) an enable look up table mode field that selectively enables lookup of texture values from a color look 

up table, 

5 (c) a texture look up table type mode field that specifies type of texels in the color look up table, 

(b) a sample type mode field that specifies how texels should be sampled, 

(a) a mid texel mode field that specifies whether texels should be filtered using a 2x2 half texel interpolation, 
(Z) a first bilerp mode field that specifies whether a texture filter should bilinearly interpolate texels in 
pipeline cycle 0, 

io (Y) a second bilerp mode field that specifies whether a texture filter should bilinearly interpolate texels in 

pipeline cycle 1 , 

(X) a texel convert mode field that specifies whether a texel outputted by the texture filter during pipeline 
cycle 0 should be color converted, 

(W) a chroma key enable mode field that selectively enables chroma keying, 
*s (V2) an rgb dither select mode field that selects type of rgb dithering, 

(V1) an alpha dither select mode field that selects type of alpha dithering, 
(V) a plurality of blend modewords that specify blender parameters, 

(M) a force blend enable mode field that specifies whether the blender should be force enabled, 

(L) an alpha coverage select mode field that specifies whether coverage should be used to determine 
20 pixel alpha, 

(K) a coverage times alpha select mode field that specifies whether coverage multiplied by alpha should 

be used to determine pixel alpha and coverage, 

(J) a z mode select mode field that specifics z buffering mode, 

(I) a coverage destination mode field that specifies coverage destination, 
25 (H) a color on coverage mode field that specifies whether color should be updated only on coverage 

overflow, 

(G) an image read enable mode field that selectively enables color and/or coverage read/modify/write 
frame buffer memory access, 

(F) a z update enable mode field that selectively enables z buffer writing conditioned on whether color 

30 write is enabled, 

(E) a z compare enable mode field that specifies conditional color write enable on depth comparison, 

(D) an anti-alias enable mode field that allows blend enable using coverage, 

(C) a z source select mode field that chooses between primitive depth and pixel depth. 

(B) a dither alpha enable mode field that specifies whether random noise should be used in alpha compare, 

35 and 

(A) an alpha compare enable mode field that enables conditional color write on alpha compare. 

5. A process as in claim 1 including the step of generating a cycle type mode field that selects a display pipeline cycle 
control mode of 1 cycle per pixel mode, 2 cycles per pixel mode, copy mode and fill mode. 
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6. A process as in claim 1 including the step of generating a texture look up table type mode field that between: 



(1) storing texels in a color look up table in RGBA format of 5 bits red, 5 bits green, 5 bits blue and 1 bit alpha, and 

(2) storing texels in a color look up table in intensity alpha format providing an 8 bit intensity value and an 8 
45 bit alpha value. 

7. A process as in claim 1 including the step of generating a sample type mode field that selects between: 

(1) point sampling, and 
so (2) 2x2 array sampling. 

8. A process as in claim 1 including the step of generating an rgb dither select mode field that selects between 
dithering based on: 

55 ( 1 } a magic square matrix, 

(2) a bayer matrix, 

(3) noise, or 

(4) no dithering 
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9. A process as in claim 1 including the step of generating an alpha dithering select mode field that specifics dithering 
based on; 

(1) a predetermined pattern, 

(2) the negative of the predetermined pattern 

(3) noise, or 

(4) no dithering. 

1 0. A process as in claim 1 including the step of generating a plurality of blend modeword that specify blender parameter 
specifying: 

selectively multiplying a first blender input during pipeline cycle 0, 
selectively multiplying the first blender input during pipeline cycle 1 , 
selectively multiplying a second blender input during pipeline cycle 0, 
selectively multiplying the second blender input during pipeline cycle 1 , 
seiectivety multiplying a third blender input during pipeline cycle 0, 
selectively multiplying the third blender input during pipeline cycle 1 , 
selectively multiplying a fourth blender input during pipeline cycle 0, 
selectively multiplying the fourth blender input during pipeline cycle 1. 

11. A process as in claim 1 including the step of generating a coverage destination mode field that selects between 
the following coverage destination modes: 

(1) clamp, 

(2) wrap, 

(3) force to full coverage, and 

(4) save. 

12. A process as in claim 1 including the step of generating a z mode select mode field that selects one of the following 
z buffering modes: 

(1) opaque, 

(2) interpenetrating, 

(3) transparent, and 

(4) decal. 

13. A system for generating at least one 3D display mode control command for processing by a 3D graphics system, 
the system including: 

at least one processor, 
at least one memory, and 

circuitry coupled to the processor and to the memory for providing at least one set mode command having: 
a command identifier field including a six-bit binary value of 101111, and 
at least one of the following mode control fields; 

(k) an atomic primitive mode field that specifies whether to force writing a primitive to a frame buffer before 
reading a following primitive, 

(i) a cycle type mode field that selects a display pipeline cycle control mode, 

(h) a perspective texture enable mode field that selectively enables perspective texture correction, 

(g) a texture detail mode field that selectively enables texture detail processing, 

(f) a texture sharpen enable mode field that selectively enables texture sharpening, 

(e) a texture detail enable mode field that selectively enables texture level-of-detail processing, 

(d) an enable look up table mode fiold that selectively enables lookup of texture values from a color look 

up table, 

(c) a texture look up iabie type mode field that specifies type of texels in the color look up table, 
(b) a sample type mode field that specifies how texels should be sampled, 

(a) a mid texel mode field that specifies whether texels should be filtered using a 2x2 half texel interpolation, 
(Z) a first bilerp mode field that specifies whether a texture filter should bilinearly interpolate texels in 
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pipeline cycle 0, 

(Y) a second bilerp mode field that specifies whether a texture fitter should bilinearly interpolate texels in 
pipeline cycle 1 , 

(X) a texel convert mode field that specifies whether a texel outputted by the texture filter during pipeline 
cycle 0 should be color converted, 

(W) a chroma key enable mode field that selectively enables chroma keying, 
(V2) an rgb dither select mode field that selects type of rgb dithering, 
( V1 ) an alpha dither select mode field that selects type of alpha dithering, 
(V) a plurality of blend modewords that specify blender parameters, 

(M) a force blend enable mode field that specifies whether the blender should be force enabled, 

(L) an alpha coverage select mode field that specifies whether coverage should be used to determine 

pixel alpha, 

(K) a coverage times alpha select mode field that specifies whether coverage multiplied by alpha should 

be used to determine pixel alpha and coverage, 

(J) a z mode select mode field that specifies z buffering mode, 

(I) a coverage destination mode field that specifies coverage destination, 

(H) a color on coverage mode field that specifies whether color should be updated only on coverage 
overflow, 

(G) an image read enable mode field that selectively enables color and/or coverage read/modify/write 
frame buffer memory access, 

(F) a z update enable mode field that selectively enables z buffer writing conditioned on whether color 
write is enabled, 

(E) a z compare enable mode field that specifies conditional color write enable on depth comparison, 

(D) an anti-alias enable mode field that allows blend enable using coverage, 

(C) a z source select mode field that chooses between primitive depth and pixel depth. 

(B) a dither alpha enable mode field that specifies whether random noise should be used in alpha compare, 

and 

(A) an alpha compare enable mode field that enables conditional color write on alpha compare, 

14. A system as in claim 10 including means for generating a cycle type mode field that selects a display pipeline cycle 
control mode of 1 cycle per pixel mode, 2 cycles per pixel mode, copy mode and fill mode. 

1 5. A system as in claim 1 0 including means for providing a texture look up table type mode field that selects between: 

(1) storing texels in a color look up table in RGBA format of 5 bits red, 5 bits green, 5 bits blue and 1 bit alpha, and 

(2) storing texels in a color look up table in intensity alpha format providing an 8 bit intensity value and an 8 
bit alpha value. 

16. A system as in claim 10 including circuitry for providing a sample type mode field that selects between: 

(1) point sampling, and 

(2) 2x2 array sampling. 

1 7. A system as in claim 1 0 including circuitry for providing an rgb dither select mode field that selects between dithering 
based on: 

(1 ) a magic square matrix, 

(2) a bayer matrix, 

(3) noise, or 

(4) no dithering. 

18. A system as in claim 10 including means for generating an alpha dithering select mode field that specifies dithering 
based on: 

(1) a predetermined pattern, 

(2) the negative of the predetermined pattern 

(3) noise, or 

(4) no dithering. 
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19. A system as in claim 10 including means for generating a plurality of blend modeword that specify blender param- 
eters specifying 

selectively multiplying a first blender input during pipeline cycle 0, 
selectively multiplying the first blender input during pipeline cycle 1, 
selectively multiplying a second blender input during pipeline cycle 0, 
selectively multiplying the second blender input during pipeline cycle 1 , 
selectively multiplying a third blender input during pipeline cycle 0, 
selectively multiplying the third blender input during pipeline cycle 1, 
selectively multiplying a fourth blender input during pipeline cycle 0, 
selectively multiplying the fourth blender input during pipeline cycle 1. 

20. A system as in claim 10 including circuitry for generating a coverage destination mode field that selects between 
the following coverage destination modes: 

(1) clamp, 

(2) wrap, 

(3) force to full coverage, and 

(4) save. 

21. A system as in claim 1 0 including circuitry for providing a z mode select mode field that selects one of the following 
z buffering modes: 

(1) opaque, 

(2) interpenetrating, 

(3) transparent, and 

(4) decal. 

22. In a 3D graphics system, a process for interpreting at least one set mnde command including the steps of: 

(a) interpreting a command identifier field including a six-bit binary value of 101111, 

(b) interpreting at least one of the following mode control fields: 

(k) an atomic primitive mode field that specifies whether to force writing a primitive to a frame buffer before 
reading a following primitive, 

(1) a cycle type mode field that selects a display pipeline cycle control mode, 

(h) a perspective texture enable mode field that selectively enables perspective texture correction, 

(g) a texture detail mode field that selectively enables texture detail processing, 

(f) a texture sharpen enable mode field that selectively enables texture sharpening, 

(e) a texture detail enable mode field that selectively enables texture level-of-detail processing, 

(d) an enable look up table mode field that selectively enables lookup of texture values from a color look up 

table, 

(c) a texture look up table type mode field that specifies type of texels in the color look' up table, 
(b) a sample type mode field that specifies how texels should be sampled, 

(a) a mid texel mode field that specifies whether texels should be filtered using a 2x2 half texel interpolation, 

(2) a first bilerp mode field that specifies whether a texture filter should bilinearly interpolate texels in pipeline 
cycle 0, 

(Y) a second bilerp mode field that specifies whether a texture filter should bilinearly interpolate texels in 
pipeline cycle 1 , 

(X) a texel convert mode field that specifies whether a texel outputted by the texture filter during pipeline cycle 
0 should be color converted, 

(W) a chroma key enable mode field that selectively enables chroma keying, 
(V2) an rgb dither select mode field that selects type of rgb dithering, 
(V1) an alpha dither select mode field that selects type of alpha dithering, 
(V) a plurality of blend modewords that specify blender parameters, 

(M) a force blend enable mode field that specifies whether the blender should be force enabled, 

(L) an alpha coverage select mode field that specifies whether coverage should be used to determine pixel 

alpha. 

(K) a coverage times alpha select mode field that specifies whether coverage multiplied by alpha should be 
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used to determine pixel alpha and coverage, 

(J) a z mode select mode field that specifies z buffering mode, 

(I) a coverage destination mode field that specifies coverage destination, 

(H) a color on coverage mode field that specifies whether color should be updated only on coverage overflow, 
(G) an image read enable mode field that selectively enables color and/or coverage read/modify/write frame 
buffer memory access, 

(F) a z update enable mode field that selectively enables z buffer writing conditioned on whether color write 
is enabled, 

(E) a z compare enable mode field that specifies conditional color write enable on depth comparison, 

(D) an anti-alias enable mode field that allows blend enable using coverage : 

(C) a z source select mode field that chooses between primitive depth and pixel depth, 

(B) a dither alpha enable mode field that specifies whether random noise should be used in alpha compare, 

(A) an alpha compare enable mode field that enables conditional color write on alpha compare, and 

(c) generating an image based at least in part on step (b). 

23. A process as in claim 19 including'the step of interpreting a cycle type mode field that selects a display pipeline 
cycle control mode of 1 cycle per pixel mode, 2 cycles per pixel mode, copy mode and fill mode. 

24. A process as in claim 1 9 including the step of interpreting a texture look up table type mode field that selects 
between: 

( 1 ) storing texels in a color look up table in RGBA format of 5 bits red, 5 bits green, 5 bits blue and t bit alpha, and 

(2) storing texels in a color look up table in intensity alpha format providing an 8 bit intensity value and an 8 
bit alpha value. 

25. A process as in claim 1 9 including the step of interpreting a sample type mode field that selects between: 

(1) point sampling, and 

(2) 2x2 array sampling. 

26. A process as in claim 19 including the step of interpreting an rgb dither select mode field that selects between 
dithering based on: 

(1) a magic square matrix, 

(2) a bayer matrix, 

(3) noise, or 

(4) no dithering. 

27. A process as in claim 1 9 including the step of interpreting an alpha dithering select mode field that specifies dithering 
based on: 

(1) a predetermined pattern, 

(2) the negative of the predetermined pattern, 

(3) noise, or 

(4) no dithering. 

28. A process as in claim 19 including the step of interpreting a plurality of blend modeword that specify blender 
parameters specifying: 

selectively multiplying a first blender input during pipeline cycle 0, 
selectively multiplying the first blender input during pipeline cycle 1, 
selectively multiplying a second blender input during pipeline cycle 0, 
selectively multiplying the second blender input during pipeline cycle 1 , 
selectively multiplying a third blender input during pipeline cycle 0, 
selectively multiplying the third blender input during pipeline cycle 1, 
selectively multiplying a fourth blender input during pipeline cycle 0, 
selectively multiplying the fourth blender input during pipeline cycle 1 
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29. A process as in claim 19 including the step of interpreting a coverage destination mode field that selects between 
the following coverage destination modes: 

(1 ) clamp. 

(2) wrap, 

(3) force to full coverage, and 

(4) save. 

30. A process as in claim 19 including the step of interpreting a z mode select mode field that selects one of the 
following z buffering modes: 

(1) opaque, 

(2) interpenetrating, 

(3) transparent, and 

(4) decal. 

31. A 3D graphics system for interpreting at least one set mode command having a command identifier field including 
a six-bit binary value of 101111 , the system including: 

a first decoder that interprets a command identifier field including a six -bit binary value of 10111 1 , 

(k) circuitry for interpreting an atomic primitive mode field that specifies whether to force writing a primitive to 
a frame buffer before reading a following primitive, 

(i) circuitry for interpreting a cycle type mode field that selects a display pipeline cycle control mode, 

(h) circuitry for interpreting a perspective texture enable mode field that selectively enables perspective texture 

correction, 

(g) circuitry for interpreting a texture detail mode field that selectively enables texture detail processing, 
(f) circuitry for interpreting a texture sharpen enable mode field that selectively enables texture sharpening, 
(e) circuitry for interpreting a texture detail enable mode field that selectively enables texture level-of-detail 
processing, 

(d) circuitry for interpreting an enable look up table mode field that selectively enables lookup of texture values 
from a color look up table. 

(c) circuitry for interpreting a texture look up table type mode field that specifies type of texels in the color look 
up table, 

(b) circuitry for interpreting a sample type mode field that specifies how texels should be sampled, 

(a) circuitry for interpreting a mid texel mode field that specifies whether texels should be filtered using a 2x2 

half texel interpolation, 

(Z) circuitry for interpreting a first bilerp mode field that specifies whether a texture filter should bilinearly in- 
terpolate texels in pipeline cycle 0, 

(Y) circuitry for interpreting a second bilerp mode field that specifies whether a texture filter should bilinearly 
interpolate texels in pipeline cycle 1, 

(X) circuitry for interpreting a texel convert mode field that specifies whether a texel outputted by the texture 
filter during pipeline cycle 0 should be color converted, 

(W) circuitry for interpreting a chroma key enable mode field that selectively enables chroma keying, 

(V2) circuitry for interpreting an rgb dither select mode field that selects type of rgb dithering, 

(V1 ) circuitry for interpreting an alpha dither select mode field that selects type of alpha dithering, 

(V) circuitry for interpreting a plurality of blend modewords that specify blender parameters, 

(M) circuitry for interpreting a force blend enable mode field that specifies whether the blender should be force 

enabled, 

(L) circuitry for interpreting an alpha coverage select mode field that specifies whether coverage should be 
used to determine pixel alpha, 

(K) circuitry for interpreting a coverage times alpha select mode field that specifies whether coverage multiplied 

by alpha should be used to determine pixel alpha and coverage, 

(J) circuitry for interpreting a z mode select mode field that specifies z buffering mode, 

(I) circuitry for interpreting a coverage destination mode field that specifies coverage destination, 

(H) circuitry for interpreting a color on coverage mode field that specifies whether color should be updated 

only on coverage overflow, 

(G) circuitry for interpreting an image read enable mode field that selectively enables color and/or coverage 
read/modify/ write frame buffer memory access, 
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(F) circuitry for interpreting a z update enable mode field that selectively enables z buffer writing conditioned 
on whether color write is enabled, 

(E) circuitry for interpreting a z compare enable mode field that specifies conditional color write enable on 
depth comparison, 

s (D) circuitry for interpreting an anti-alias enable mode field that allows blend enable using coverage, 

(C) circuitry for interpreting a z source select mode field that chooses between primitive depth and pixel depth, 
(B) circuitry for interpreting a dither alpha enable mode field that specifies whether random noise should be 
used in alpha compare, 

(A) circuitry for interpreting an alpha compare enable mode field that enables conditional color write on alpha 
to compare, and 

(c) circuitry coupled to above-mentioned circuitry (a)-(k), (A)-(M), (V1), (V2) and (W)-(Z) for generating an 
image. 

32. An apparatus as in claim 28 including means for interpreting a cycle type mode field that selects a display pipeline 
'5 cycle control mode of 1 cycle per pixel mode, 2 cycles per pixef mode, copy mode and fill mode. 

33. An apparatus as in claim 28 including means for interpreting a texture look up table type mode field that selects 
between: 

20 (1 ) storing texels in a color look up table in RGBA format of 5 bits red, 5 bits green, 5 bits blue and 1 bit alpha, and 

(2) storing texels in a color look up table in intensity alpha format providing an 8 bit intensity value and an 8 
bit alpha value 

34. An apparatus as in claim 28 including means for interpreting a sample type mode field that selects between: 

25 

(1 ) point sampling, and 

(2) 2x2 array sampling. 

35. An apparatus as in claim 28 including means for interpreting an rgb dither select mode field that selects between 
30 dithering based on: 

(1) a magic square matrix, 

(2) a bayer matrix, 

(3) noise, or 

35 (4) no dithering. 

36. An apparatus as in claim 28 including means for interpreting an alpha dithering select mode field that specifies 
dithering based on: 

40 (1) a predetermined pattern, 

(2) the negative of the predetermined pattern 

(3) noise, or 

(4) no dithering. 

45 37. An apparatus as in claim 28 including means for interpreting a plurality of blend modeword that specify blender 
parameters specifying: 

selectively multiplying a first blender input during pipeline cycle 0, 

selectively multiplying the first blender input during pipeline cycle 1, 
50 selectively multiplying a second blender input during pipeline cycle 0, 

selectively multiplying the second blender input during pipeline cycle 1, 

selectively multiplying a third blender input during pipeline cycle 0, 

selectively multiplying the third blender input during pipeline cycle 1 , 

selectively multiplying a fourth blender input during pipeline cycle 0, 
ss selectively multiplying the fourth blender input during pipeline cycle 1. 

38. An apparatus as in claim 28 including means for interpreting a coverage destination mode field that selects between 
the following coverage destination modes: 
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(1 ) clamp, 

(2) wrap, 

(3) force to full coverage, and 

(4) save. 

5 

39. An apparatus as in claim 28 including means for interpreting a z mode select mode field that selects one of the 
following z buffering modes: 

(1) opaque, 
10 (2) interpenetrating, 

(3) transparent, and 

(4) decal. 

40. A storage medium for use with a 3D graphics system, the storage medium storing at least one 3D display mode 
ts control command including: 

a command identifier field including a six-bit binary value of 101111, and 
at least one of the following mode control fields: 

20 (k) an atomic primitive mode field that specifies whether to force writing a primitive to a frame buffer before 

reading a following primitive, 

(i) a cycle type mode field that selects a display pipeline cycle control mode, 
(h) a perspective texture enable mode field that selectively enables perspective texture correction, 
(g) a texture detail mode field that selectively enables texture detail processing, 
2S (f) a texture sharpen enable mode field that selectively enables texture sharpening, 

(e) a texture detail enable mode field that selectively enables texture level-of-detail processing, 

(d) an enable look up table mode field that selectively enables lookup of texture values from a color look 

up table, 

(c) a texture look up table type mode field that specifies type of texels in the color look up table, 
30 (b) a sample type mode field that specifies how texels should be sampled, 

(a) a mid texel mode field that specifies whether texels should be filtered using a 2x2 half texel interpolation, 
(Z) a first bilerp mode field that specifies whether a texture filter should bilinearly interpolate texels in 
pipeline cycle 0, 

(Y) a second bilerp mode field that specifies whether a texture filter should bilinearly interpolate texels in 
35 pipeline cycle 1, 

(X) a texel convert mode field that specifies whether a texel outputted by the texture filter during pipeline 
cycle 0 should be color converted, 

(W) a chroma key enable mode field that selectively enables chroma keying, 
(V2) an rgb dither select mode field that selects type of rgb dithering, 
40 (V1 ) an alpha dither select mode field that selects type of alpha dithering, 

(V) a plurality of blend modewords that specify blender parameters, 

(M) a force blend enable mode field that specifies whether the blender should be force enabled, 

(L) an alpha coverage select mode field that specifies whether coverage should be used to determine 

pixel alpha, 

45 (K) a coverage times alpha select mode field that specifies whether coverage multiplied by alpha should 

be used to determine pixel alpha and coverage, 
(J) a z mode select mode field that specifies z buffering mode, 
(I) a coverage destination mode field that specifies coverage destination, 

(H) a color on coverage mode field that specifies whether color should be updated only on coverage 
so overflow, 

(G) an image read enable mode field that selectively enables color and/or coverage read/modify/write 
frame buffer memory access, 

(F) a z update enable mode field that selectively enables z buffer writing conditioned on whether color 
write is enabled, 

SS (E) a z compare enable mode field that specifies conditional color write enable on depth comparison, 

(D) an anti-alias enable mode field that allows blend enable using coverage, 
(C) a z source select mode field that chooses between primitive depth and pixel depth. 
(B) a dither alpha enable mode field that specifies whether random noise should be used in alpha compare, 
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and 

(A) an alpha compare enable mode field that enables conditional color write on alpha compare 

41 . A storage medium as in claim 37 including means for storing a cycle type mode field that selects a display pipeline 
s cycle control mode of 1 cycle per pixel mode, 2 cycles per pixel mode, copy mode and fill mode. 

42, A storage medium as in claim 37 including means for storing a texture look up table type mode field that selects 
between; 

10 (1 ) storing texeis in a color look up table in RGBA format of 5 bits red, 5 bits green, 5 bits blue and 1 bit alpha, and 

(2) storing texeis in a color look up table in intensity alpha format providing an 8 bit intensity value and an 8 
bit alpha value. 

A storage medium as in claim 37 including means for storing a sample type mode field that selects between: 

(1) point sampling, and 

(2) 2x2 array sampling. 

44. A storage medium as in claim 37 including means for storing an rgb dither select mode field that selects between 
20 dithering based on: 

(1) a magic square matrix, 

(2) a bayer matrix, 

(3) noise, or 

25 (4) no dithering. 

45. A storage medium as in claim 37 including means for storing an alpha dithering select mode field that specifies 
dithering based on: 

30 (i ) a predetermined pattern, 

(2) the negative of the predetermined pattern, 

(3) noise, or 

(4) no dithering. 

35 46. A storage medium as in claim 37 including means for storing a plurality of blend modeword that specify blender 
parameters specifying: 

selectively multiplying a first blender input during pipeline cycle 0, 

selectively multiplying the first blender input during pipeline cycle 1, 
40 selectively multiplying a second blender input during pipeline cycle 0, 

selectively multiplying the second blender input during pipeline cycle 1, 

selectively multiplying a third blender input during pipeline cycle 0, 

selectively multiplying the third blender input during pipeline cycle 1 , 

selectively multiplying a fourth blender input during pipeline cycle 0, 
45 selectively multiplying the fourth blender input during pipeline cycle 1 . 

47. A storage medium as in claim 37 including means for storing a coverage destination mode field that selects between 
the following coverage destination modes: 

so (1) clamp, 

(2) wrap, 

(3) force to full coverage, and 

(4) save. 

-5 48. A storage medium as in claim 37 including means for storing a z mode select mode field that selects one of the 
following z buffering modes: 

(1) opaque, 
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(2) interpenetrating, 

(3) transparent, and 

(4) decai. 

49. A process for generating at least one display mode control command for processing by a 3D graphics system, the 
process including the step of generating at least one set mode command having: 

a command identifier field including a six -bit binary value of 111100, and 
at least one of the following additional fields: 

at least one combiner subtract mode control field that specifies subtracting at least one color space value from 
a color combiner, 

at least one combiner multiply mode control field that specifies multiplying a color combiner input by at least 
one color space value, and 

at least one combiner add control field that specifies a color combiner adder input. 

50. A process as in claim 46 further including the step of generating the lollowing combiner subtract mode control 
fields to specify multiplexer sources for implementing the function (A-B)*C + D: 

(1 ) a subtract source control field specifying a subtract Source A, 

(2) a subtract source control field specifying a subtract Source B, 

(3) a multiply source control field specifying a multiply source C : and 

(4) an add source control field specifying an add source D. 

51. A process as in claim 46 further including the step of generating the following combiner subtract mode control 
fields to specify multiplexer sources for implementing the function (A-B)*C -*- D: 

(1) a subtract source control field specifying a RGB component subtract Source A, 

(2) a subtract source control field specifying an alpha component subtract Source A, 

(3) a subtract source control field specifying a RGB component subtract Source B. 

(4) a subtract source control field specifying an alpha component subtract SourcB B, 

(5) a multiply source control field specifying an RGB component multiply source C, 

(6) a multiply source control field specifying an alpha component multiply source C, 

(7) an add source control field specifying an RGB component add source D, and 

(8) an add source control field specifying an alpha component add source D. 

52. A process as in claim 46 further including the step of generating the fallowing combiner subtract mode control 
fields to specify multiplexer sources for implementing the function (A-B)*C + D: 

(1) a subtract source control field specifying a RGB component subtract Source A for pipeline cycle 0; 

(2) a subtract source control field specifying a RGB component subtract Source A for pipeline cycle 1 ; 

(3) a subtract source control field specifying an alpha component subtract Source A for pipeline cycle 0; 

(4) a subtract source control field specifying an alpha component subtract Source A for pipeline cycle 1 ; 

(5) a subtract source control field specifying a RGB component subtract Source B for pipeline cycle 0; 

(6) a subtract source control field specifying a RGB component subtract Source B for pipeline cycle 1; 

(7) a subtract source control field specifying an alpha component subtract Source B for pipeline cycle 0; 

(8) a subtract source control field specifying an alpha component subtract Source B for pipeline cycle 1 ; 

(9) a multiply source control field specrfying an RGB component multiply source C for pipeline cycle 0; 

(10) a multiply source control field specifying an alpha component multiply source C for pipeline cycle 0; 

(11) a multiply source control field specifying an RGB component multiply source C for pipeline cycle 1, 

(12) a multiply source control field specifying an alpha component multiply source C for pipeline cycle 1; 

(13) an add source control field specifying an RGB component add source D for pipeline cycle 0; 

(14) an add source control field specifying an alpha component add source D for pipeline cycle 0; 

(15) an add source control field specifying an RGB component add source D for pipeline cycle 1 ; and 

(16) an add source control field specifying an alpha component add source D for pipeline cycle 1. 

53. A process as in claim 46 further including- 

generating at least first, second, third and fourth multiplexer select values specifying corresponding inputs for 
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an RGB color combiner channel; and 

generating at least fifth, sixth, seventh and eighth multiplexer select values specifying corresponding inputs 
for an alpha color combiner channel 

5 54. A process as in claim 46 further including: 

generating at least first, second, third and fourth multiplexer select values specifying corresponding inputs for 
a pipeline cycle 0 color combine operation; and 

generating at least fifth, sixth, seventh and eighth multiplexer select values specifying corresponding inputs 
to for a pipeline cycle 1 color combine operation. 

55. A system for generating at least one display mode control command for processing by a 3D graphics system, the 
system including: 

is at feast one processor, 

at least one memory, and 

circuitry coupled to the processor and to the memory for providing at least one set mode command having: 
a command identifier field including a six-bit binary value of 1 11 1 00, and 
at least one of the following additional fields: 
20 at least one combiner subtract mode control field that specifies subtracting at least one color space value from 

a color combiner, 

at least one combiner multiply mode control field that specifies multiplying a color combiner input by at least 
one color space value, and 

at least one combiner add control field that specifies a color combiner adder input. 

25 

56. A system as in claim 52 further including means for providing the following combiner subtract mode control fields 
to specify multiplexer sources for implementing the function (A-B)*C + D; 

(1) a subtract source control field specifying a subtract Source A, 
so (2) a subtract source control field specifying a subtract Source B, 

(3) a multiply source control field specifying a multiply source C, and 

(4) an add source control field specifying an add source D. 



57. A system as in claim 52 further including means for generating the following combiner subtract mode control fields 
35 to specify multiplexer sources for implementing the function (A-B)*C + D: 

(1) a subtract source control field specifying a RGB component subtract Source A, 

(2) a subtract source control field specifying an alpha component subtract Source A, 

(3) a subtract source control field specifying a RGB component subtract Source B, 
40 (4) a subtract source control field specifying an alpha component subtract Source B, 

(5) a multiply source control field specifying an RGB component multiply source C, 

(6) a multiply source control field specifying an alpha component multiply source C,~ 

(7) an add source control field specifying an RGB component add source D, and 

(8) an add source control field specifying an alpha component add source D. 

45 

58. A system as in claim 52 further including circuitry for generating the following combiner subtract mode control fields 
to specify multiplexer sources for implementing the function (A-B)*C + D: 

(1) a subtract source control field specifying a RGB component subtract Source A for pipeline cycle 0, 
so (2) a subtract source control field specifying a RGB component subtract Source A for pipeline cycle 1; 

(3) a subtract source control field specifying an alpha component subtract Source A for pipeline cycle 0; 

(4) a subtract source control field specifying an alpha component subtract Source A for pipeline cycle 1 ; 

(5) a subtract source control field specifying a RGB component subtract Source B for pipeline cycle 0; 

(6) a subtract source control field specifying a RGB component subtract Source B for pipeline cycle 1; 
55 (7) a subtract source control field specifying an alpha component subtract Source B for pipeline cycle 0; 

(8) a subtract source control field specifying an alpha component subtract Source B for pipeline cycle 1 ; 

(9) a multiply source control field specifying an RGB component multiply source C for pipeline cycle 0; 

(10) a multiply source control field specifying an alpha component multiply source C for pipeline cycle 0; 
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(11) a multiply source control field specifying an RGB component multiply source C for pipeline cycle 1 ; 

(12) a multiply source control field specifying an alpha component multiply source C for pipeline cycle 1; 

(1 3) an add source control field specifying an RGB component add source D for pipeline cycle 0; 

(1 4) an add source control field specifying an alpha component add source D for pipeline cycle 0; 

(1 5) an add source control field specifying an RGB component add source D for pipeline cycle 1 ; and 

(16) an add source control field specifying an alpha component add source D for pipeline cycle 1. 

59. A system as in claim 52 further including: 

circuitry for providing at least first, second, third and fourth multiplexer select values specifying corresponding 
inputs for an RGB color combiner channel; and 

means for providing at least fifth, sixth, seventh and eighth multiplexer select values specifying corresponding 
inputs for an alpha color combiner channel. 

60. A system as in claim 52 further including: 

circuitry for generating at least first, second, third and fourth multiplexer select values specifying corresponding 
inputs for a pipeline cycle 0 color combine operation; and 

circuitry for generating at least fifth, sixth, seventh and eighth multiplexer select values specifying correspond- 
ing inputs for a pipeline cycle 1 color combine operation. 

61. In a 3D graphics display system, a process for interpreting at least one set mode command comprising: 

(a) interpreting a command identifier field including a six -bit binary value of 111100, 

(b) interpreting at least one of the following additional fields: 

at least one combiner subtract mode control field that specifies subtracting at least one color space value 
from a color combiner, 

at least one combiner multiply mode control field that specifies multiplying a color combiner input by at 
least one color space value, and 

at least one combiner add control field that specifies a color combiner adder input, and 

(c) generating an image based at least in part on step (b). 

62. A process as in claim 58 further including the step of interpreting the following combiner subtract mode control 
fields to specify multiplexer sources for implementing the function (A-B)*C + D: 

(1) a subtract source control field specifying a subtract Source A, 

(2) a subtract source control field specifying a subtract Source B, 

(3) a multiply source control field specifying a multiply source C and 

(4) an add source control field specifying an add source D. 

63. A process as in claim 58 further including the step of interpreting the following combiner subtract mode control 
fields to specify multiplexer sources for implementing the function (A-B)*C + D; 

(1) a subtract source control field specifying a RGB component subtract Source A. 

(2) a subtract source control field specifying an alpha component subtract Source A, 

(3) a subtract source control field specifying a RGB component subtract Source B, 

(4) a subtract source control field specifying an alpha component subtract Source B, 

(5) a multiply source control field specifying an RGB component multiply source C, 

(6) a multiply source control field specifying an alpha component multiply source C, 

(7) an add source control field specifying an RGB component add source D, and 

(8) an add source control field specifying an alpha component add source D. 

64. A process as in claim 58 further including the step of interpreting the following combiner subtract mode control 
fields to specify multiplexer sources for implementing the function (A-B)*C + D* 

(1 ) a subtract source control field specifying a RGB component subtract Source A for pipeline cycle 0; 
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(2) a subtract source control field specifying a RGB component subtract Source A for pipeline cycle 1 ; 

(3) a subtract source control field specifying an alpha component subtract Source A for pipeline cycle 0; 

(4) a subtract source control field specifying an alpha component subtract Source A for pipeline cycle 1 ; 

(5) a subtract source control field specifying a RGB component subtract Source Bfor pipeline cycle 0; 
5 (6) a subtract source control field specifying a RGB component subtract Source B for pipeline cycle 1 ; 

(7) a subtract source control field specifying an alpha component subtract Source B for pipeline cycle 0; 

(8) a subtract source control field specifying an alpha component subtract Source B for pipeline cycle 1 : 

(9) a multiply source control field specifying an RGB component multiply source C for pipeline cycle 0; 

(10) a multiply source control field specifying an alpha component multiply source C for pipeline cycle 0; 
10 (11) a multiply source control field specifying an RGB component multiply source C for pipeline cycle 1 ; 

(12) a multiply source control field specifying an alpha component multiply source C for pipeline cycle 1; 

(13) an add source control field specifying an RGB component add source D for pipeline cycle 0; 

(14) an add source control field specifying an alpha component add source D for pipeline cycle 0; 

(15) an add source control field specifying an RGB component add source D for pipeline cycle 1, and 
75 (1 6) an add source control field specifying an alpha component add source D for pipeline cycle 1 . 

65. A process as in claim 58 further including: 

interpreting at least first, second, third and fourth multiplexer select values specifying corresponding inputs 
20 for an RGB color combiner channel; and 

interpreting at least fifth, sixth, seventh and eighth multiplexer select values specifying corresponding inputs 
for an alpha color combiner channel. 

66. A process as in claim 58 further including: 

2S 

interpreting at least first, second, third and fourth multiplexer select values specifying corresponding inputs 
for a pipeline cycle 0 color combine operation; and 

interpreting at least fifth, sixth, seventh and eighth multiplexer select values specifying corresponding inputs 
for a pipeline cycle 1 color combine operation. 

30 

67. A 3D graphics display system for interpreting at least one set mode command including a six-bit binary value of 
111100, the system comprising: 

(a) means for interpreting a command identifier field including a six-bit binary value of 111100, 
35 (b) means for generating at least one control signal based- on at least one of the following additional fields: 

at least one combiner subtract mode control field that specifies subtracting at least one color space value 
from a color combiner, 

at least one combiner multiply mode control field that specifies multiplying a color combiner input by at 
*o least one color space value, and 

at least one combiner add control field that specifies a color combiner adder input, and 

(c) means for generating an image based at least in part on the control signal. 

45 68. An apparatus as in claim 64 further including a combiner circuit that interprets the following combiner subtract 
mode control fields to specify multiplexer sources for implementing the function (A-B)*C + D: 

(1) a subtract source control field specifying a subtract Source A, 

(2) a subtract source control field specifying a subtract Source B, 

so (3) a multiply source control field specifying a multiply source C : and 

(4) an add source control field specifying an add source D. 

69. An apparatus as in claim 64 further including a combiner that interprets the following combiner subtract mode 
control fields to specify multiplexer sources for implementing the function (A-B)*C + D: 

55 

(1) a subtract source control field specifying a RGB component subtract Source A, 

(2) a subtract source control field specifying an alpha component subtract Source A, 

(3) a subtract source control field specifying a RGB component subtract Source B. 
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(4) a subtract source control field specifying an alpha component subtract Source B, 

(5) a muftiply source control field specifying an RGB component multiply source C, 

(6) a multiply source control field specifying an alpha component multiply source C, 

(7) an add source control field specifying an RGB component add source D, and 
5 (8) an add source control field specifying an alpha component add source D. 

70. An apparatus as in claim 64 further including a combiner that combines color signals based on the following com- 
biner subtract mode control fields to specify multiplexer sources tor implementing the function (A-B)*C + D: 

(t ) a subtract source control field specifying a RGB component subtract Source A for pipeline cycle 0; 

(2) a subtract source control field specifying a RGB component subtract Source A for pipeline cycle 1; 

(3) a subtract source control field specifying an alpha component subtract Source A for pipeline cycle 0; 

(4) a subtract source control field specifying an alpha component subtract Source A for pipeline cycle 1 ; 

(5) a subtract source control field specifying a RGB component subtract Source B for pipeline cycle 0; 

(6) a subtract source control field specifying a RGB component subtract Source B for pipeline cycle 1 ; 

(7) a subtract source control field specifying an alpha component subtract Source B for pipeline cycle 0, 

(8) a subtract source control field specifying an alpha component subtract Source B for pipeline cycle 1 ; 

(9) a multiply source control field specifying an RGB component multiply source C for pipeline cycle 0; 

(10) a multiply source control field specifying an alpha component multiply source C tor pipeline cycle 0; 

(11) a multiply source control field specifying an RGB component multiply source C for pipeline cycle 1 ; 

(12) a multiply source control field specifying an alpha component multiply source C for pipeline cycle 1, 

(13) an add source control field specifying an RGB component add source D for pipeline cycle 0; 

(14) an add source control field specifying an alpha component add source D for pipeline cycle 0; 

(15) an add source control field specifying an RGB component add source D for pipeline cycle 1 ; and 

(16) an add source control field specifying an alpha component add source D forpipeline cycle 1 . 

71. An apparatus as in claim 64 further including; 

means for interpreting at least first, second, third and fourth multiplexer select values specifying corresponding 
30 inputs for an RGB color combiner channel; and 

means for interpreting at least fifth, sixth, seventh and eighth multiplexer select values specifying correspond- 
ing inputs for an alpha color combiner channel. 

72. An apparatus as in claim 64 further including; 

35 

means for interpreting at least first, second, third and fourth multiplexer select values specifying corresponding 
inputs for a pipeline cycle 0 color combine operation; and 

means for interpreting at least fifth, sixth, seventh and eighth multiplexer select values specifying correspond- 
ing inputs for a pipeline cycle 1 color combine operation. 

40 

73. A storage medium for use with a 3D graphics system, the storage medium storing at least one display mode control 
command having: 

a command identifier field including a six-bit binary value of 111100, and 
45 at least one of the following additional fields: 

at least one combiner subtract mode control field that specifies subtracting at least one color space value from 
a color combiner, 

at least one combiner multiply mode control field that specifies multiplying a color combiner input by at least 
one color space value, and 
so at least one combiner add control field that specifies a color combiner adder input. 

74. A storage medium as in claim 70 further including means for storing the following combiner subtract mode control 
fields to specify multiplexer sources for implementing the function (A-B)*C + D: 

ss (1 ) a subtract source control fieid specifying a subtract Source A, 

(2) a subtract source control field specifying a subtract Source B, 

(3) a multiply source control field specifying a multiply source C and 

(4) an add source control field specifying an add source D. 
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75. A storage medium as in claim 70 further including means for storing the following combiner subtract mode control 
fields to specify multiplexer sources for implementing the function (A-B)*C + D: 

(1) a subtract source control field specifying a RGB component subtract Source A. 
s (2) a subtract source control field specifying an alpha component subtract Source A, 

(3) a subtract source control field specifying a RGB component subtract Source B. 

(4) a subtract source control field specifying an alpha component subtract Source B, 

(5) a multiply source control field specifying an RGB component multiply source C, 

(6) a multiply source control field specifying an alpha component multiply source C, 
10 (7) an add source control field specifying an RGB component add source D, and 

(8) an add source control field specifying an alpha component add source D. 

76. A storage medium as in claim 70 further including means for storing the following combiner subtract mode control 
fields to specify multiplexer sources for implementing the function (A-B)*C + D: 

is 

(1 ) a subtract source control field specifying a RGB component subtract Source A for pipeline cycle 0; 

(2) a subtract source control field specifying a RGB component subtract Source A for pipeline cycle 1 ; 

(3) a subtract source control field specifying an alpha component subtract Source A for pipeline cycle 0; 

(4) a subtract source control field specifying an alpha component subtract Source A for pipeline cycle 1 ; 
20 (5) a subtract source control field specifying a RGB component subtract Source B for pipeline cycle 0; 

(6) a subtract source control field specifying a RGB component subtract Source B for pipeline cycle 1 ; 

(7) a subtract source control field specifying an alpha component subtract Source B for pipeline cycle O; 

(8) a subtract source control field specifying an alpha component subtract Source B for pipeline cycle 1 ; 

(9) a multiply source control field specifying an RGB component multiply source C for pipeline cycle 0; 
25 (10) a multiply source control field specifying an alpha component multiply source C for pipeline cycle 0; 

(11) a multiply source control field specifying an RGB component multiply source C for pipeline cycle 1; 

(12) a multiply source control field specifying an alpha component multiply source. C for pipeline cycle 1 ; 

(13) an add source control field specifying an RGB component add source D for pipeline cycle 0; 

(14) an add source control field specifying an alpha component add source D for pipeline cycle 0; 

30 (15) an add source control field specifying an RGB component add source D for pipeline cycle 1 ; and 

(16) an add source control field specifying an alpha component add source D for pipeline cycle 1 . 

77. A storage medium as in claim 70 further including: 

35 means for storing at least first, second, third and fourth multiplexer select values specifying corresponding 

inputs for an RGB color combiner channel; and 

means for storing at least fifth, sixth, seventh and eighth multiplexer select values specifying corresponding 
inputs for an alpha color combiner channel. 

40 78. A storage medium as in claim 70 further including: 

means for storing at least first, second, third and fourth multiplexer select values specifying corresponding 
inputs for a pipeline cycle 0 color combine operation; and 

means for storing at least fifth, sixth, seventh and eighth multiplexer select values specifying corresponding 
45 inputs for a pipeline cycle 1 color combine operation. 

79. A process for generating at least one color image mode command for processing by a 3D graphics system, the 
process including the step of generating at least one command including: 

so a command identifier field including a six -bit binary value within the set of 111111 and 111101; 

an image data format parameter; 
a color element size parameter; 
an image width parameter; and 
a base address parameter. 



ss 



80. A process as in claim 76 further including the step of generating the image data format parameter selecting between 
the following: 
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(a) rgba, 

(b) yuv, 

(c) color index, 

(d) intensity alpha, and 
5 (e) alpha. 

81. The process as in claim 76 further including the step of generating the color element size parameter selecting 
between the following: 

io (a) 4 bit wide color element value, 

(b) 8 bit wide color element value, 

(c) 16 bit wide color element value, and 

(d) 32 bit wide color element value. 

is 82. The process as in claim 76 further including the step of generating the image width parameter value specifying 
the width in pixels of an image stored in memory. 

83. The process as in claim 76 further including the step of generating the base address parameter specifying the 
base address in main memory of the top left comer of the image. 

20 

84. A system for providing at least one color image mode command for procuring by a 3D graphics system, the system 
induding 

at least one processor, 
2S at least one memory and 

circuitry coupled to the processor and to the memory for generating at least one command including; 

a command identifier field including a six-bit binary value within the set of 111111 and 111101; 

an image data format parameter; 

a color element size parameter; 
30 an image width parameter; and 

a base address parameter. 

85. A system as in claim 81 further including means for providing the image data format parameter selecting between 
the following: 

35 

(a) rgba, 

(b) yuv, 

(c) color index* 

(d) intensity alpha, and 
40 (e) alpha. 

86. The system as in claim 81 further including means for providing the color element size parameter selecting between 
the following: 

45 (a) 4 bit wide color element value, 

(b) 8 bit wide color element value, 

(c) 16 bit wide color element value, and 

(d) 32 bit wide color element value. 

so 87. The system as in claim 81 further including means for providing the image width parameter value specifying the 
width in pixels of an image stored in memory. 

88. The system as in claim 81 further including means for generating the base address parameter specifying the base 
address in main memory of the top I ft corner of the image. 

55 

89. In a 3D graphics system, a process of executing at least one color image mode command including: 

(a) interpreting a command identifier field including a six-bit binary value within the set of 111111 and 111101 
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as corresponding to color image mode; 

(b) interpreting at least one of the following: 

an image data format parameter; 
a color element size parameter; 
an image width parameter; and 
a base address parameter; and 

(c) generating a color image display based at least in part on step (b). 



90. A process as in claim 86 further including the steps of selecting between the following based on the image data 
format parameter: 

(a) rgba, 
'5 (b) yuv, 

(c) color index, 

(d) intensity alpha, and 

(e) alpha. 

20 91 . The process as in claim 86 further including the step of selecting between the following based on the color element 
size parameter: 

(a) 4 bit wide color element value, 

(b) 8 bit wide color element value, 

zs (c) 16 bit wide color element value, and 

(d) 32 bit wide color element value. 

92. The process as in claim 86 further including the step of setting the width in pixels of an image stored in memory 
based on the image width parameter value. 

30 

93. The process as in claim 86 further including the step of setting the base address in main memory of the top left 
corner of the image based on the base address parameter. 

94. A 3D graphics system for executing at least one color image mode command having a command identifier field 
3S including a six-bit binary value within the set of 111111 and 111101 as corresponding to color image mode, the 

system including: 

a command identifier decoder for interpreting a command identifier field including a six-bit binary value within 
the set of 111111 and 111101 as corresponding to color image mode; 
40 a command parameter decoder for interpreting at least one of the following: 

an image data format parameter; 
a color element size parameter; 
an image width parameter; and 
45 a base address parameter; and 

a display circuit that generates a color image display based at least in part on parameters. 

95. An apparatus as in claim 91 further including structure for selecting between the following based on the image 
so data format parameter: 

(a) rgba, 

(b) yuv, 

(c) color index, 

55 (d) intensity alpha, and 

(e) alpha. 

96. The process as in claim 91 further including structure for selecting between the following based on the color element 
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size parameter: 

(a) 4 bit wide color element value, 

(b) 8 bit wide color element value, 

5 (c) 16 bit wide color element value, and 

(d) 32 bit wide color element value. 

97. The process as in claim 91 further including means for setting the width in pixels of an image stored in memory 
based on the image width parameter value. 

10 

98. The process as in claim 91 further including a circuit for setting the base address in main memory of the top left 
corner of the image based on the base address parameter . 

99. A storage medium for use with a 3D graphics system, the storage medium for storing at least one color image 
is mode command including: 

a command identifier field including a six-bit binary value within the set of 111111 and 111101; 
an image data format parameter; 
a color element size parameter; 
so an image width parameter; and 

a base address parameter. 

100. A storage medium as in claim 96 further including means for storing an image data format parameter selecting 
between the following: 

25 

(a) rgba, 

(b) yuv, 

(c) color index, 

(d) intensity alpha, and 
30 (e) alpha. 

101. The storage medium as in claim 96 further including means for storing a color element size parameter selecting 
between the following; 

3S (a) 4 bit wide color element value, 

(b) 8 bit wide color element value, 

(c) 16 bit wide color element value, and 

(d) 32 bit wide color element value. 

40 1 02/The storage medium as in claim 96 further including means for storing an image width parameter value specifying 
the width tn pixels of an image stored in memory. 

103. The storage medium as in claim 96 further including means for storing the base address parameter specifying the 
base address in main memory of the top left corner of the image. 

45 

104. A process for generating at least one mask image mode command for processing by a 3D graphics system, the 
process including the step of generating at least one set mask image command including: 

a command identifier field including a six-bit binary value of 111110, and 
50 a base address specifing the memory address of a top left corner of at least one depth image. 

1 05. A system for providing at least one mask image mode command for processing by 3D graphics system, the system 
including: 

55 at ieast one processor, 

at least one memory, and 

means coupled to the processor and to the memory for providing at least one set mask image command 
including: 
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a command identifier field including a six-bit binary value of 111110, and 

a base address specifying the memory address of a top left corner of at least one depth image 
106.ln a 3D graphics system, a process for interpreting at least one mask image mode command including: 

5 

interpreting a command identifier field including a six-bit binary value of 111110. and 

interpreting a base address specifying the memory address of a top left comer of at least one depth image. 

107.ln a 3D graphics system, a decoder for interpreting at least one mask image mode command including a six-bit 
10 binary value of 111110, the system including: 

means for interpreting a command identifier field including a six-bit binary value of 111110, and 

means for interpreting a base address specifying the memory address of a top left comer of at least one depth 

image. 

15 

108. A storage medium for use with a 3D graphics system, the storage medium for storing at least one mask image 
mode command including: 

a command identifier field including a six-bit binary value of 1 1 11 1 0, and 
20 a base address specifying the memory address of a top left comer of al least one depth image. 

109. A process for generating at least one display command for processing by a 3D graphics system, the process 
including the step of generating at least one triangle drawing command having a command identifier field including 
a six-btt binary value within the range of 001000 to 0011 11 followed by at least x and y position values, the triangle 

25 drawing command format specifying at least one triangle at the x and y positions corresponding to the x and y 

position values. 

110. A system for generating at least one display command for processing by a 3D graphics system, the system in- 
cluding* 

JO 

at least one processor, 
at least one memory, and 

means coupled to the processor and to the memory for providing at least one triangle drawing command 
having a command identifier field including a six-bit binary value within the range of 00 1 000 to 001 1 1 1 followed 
35 by at least x and y position values, the triangle drawing command format specifying at least one triangle at 

the x and y positions corresponding to the x and y position values. 

111 .In a 3D graphics system, a process for executing at least one display command for processing including the steps of: 

40 (a) interpreting at least one triangle drawing command having a command identifier field including a six -bit 

binary value within the range of 001 000 to 001111 followed by at least x and y position values, and 
(b) rendering at least one primitive at the x and y positions corresponding to the x and y position values. 

11 2. A 3D graphics system for executing at least one display command including a six-bit binary value within the range 
45 of 001000 to 001111 followed by at least x and y position values, the system including: 

a decoder that interprets at least one triangle drawing command having a command identifier field including 
a six-bit binary value within the range of 001000 to 001111 followed by at least x and y position values, and 
a display processor that renders at least one primitive at the x and y positions corresponding to the x and y 
so position values. 

11 3. A storage medium for use with a 3D graphics system, the storage medium storing at least one display command 
having a command identifier field including a six-bit binary value within the range of 001000 to 001111 followed by 
at least x and y position values, the triangle drawing command format specifying at least one triangle at the x and 

55 y positions corresponding to the x and y position values. 

11 4. A process for generating at least one display command for processing by a 3D graphics system, the process 
including the step of generating at least one triangle drawing command including: 
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a command identifier field having a six-bit binary value of at least one of 001111 and 001011 , 

a set of edge coefficients, 

a set of texture coefficients, and 

a Set of z buffer coefficients, 

5 

the triangle drawing command format specifying at least one triangle to be drawn in accordance with the set of 
edge coefficients, filled with a texture based at least in part on the set of texture coefficients, and z buffered based 
at least in part on the set of z buffer coefficients. 

io 11 5. A system for provifing at least one display command for processing by a 3D graphics system, the system including 

at least one processor, 
at least one memory, and 

circuitry coupled to the processor and to the memory for providing at least one triangle drawing command 
75 including: 

a command identifier field having a six-bit binary value of at least one of 001111 and 001011, 

a set of edge coeffidents, 

a set of texture coefficients, and 

a set of z buffer coefficients, 

20 

the triangle drawing command format specifying at least one triangle to be drawn in accordance with the set of 
edge coefficients, filled with a texture based at least in part on the set of texture coefficients, and z buffered based 
at least in part on the sot of z buffer coefficients. 

25 I16.ln a 3D graphics system, a process for executing at least one triangle drawing command including. 

(a) interpreting a command identifier field having a six-bit binary value of at least one of 001 1 1 1 and 001 0 1 1 , 

(b) interpreting a set of edge coefficients, 

(c) interpreting a set of texture coefficients, 

30 (c) interpreting a set of z buffer coefficients, and 

(e) rendering at least one triangle in accordance with the set of edge coefficients, filled with a texture based 
at least in part on the set of texture coefficients, and z buffered based at least in part on the set of z buffer 
coefficients. 

35 117. A 3D graphics system for executing at least one triangle drawing command having a command identifier field 
having a six-bit binary value of at least one of 001111 and 001011, the system including: 

(a) means for interpreting a command identifier field having a six-bit binary value of at least one of 0011 1 1 and 
001011, 

40 (b) means for interpreting a set of edge coefficients, 

(c) means for interpreting a set of texture coefficients, 

(c) means for interpreting a set of z buffer coefficients, and 

means coupled to the above-mentioned means for rendering at least one triangle in accordance with 

the set of edge coefficients, filled with a texture based at least in part on the set of texture coefficients, and z 
45 buffered based at least in part on the set of z buffer coefficients. 

118. A storage medium lor use with a 3D graphics system, the storage medium for storing at least one triangle drawing 
command including: 

so a command identifier field having a six-bit binary value of at least one of 001111 and 001011, 

a set of edge coefficients, 
a set of texture coefficients, and 
a set of z buffer coefficients, 

55 the triangle drawing command format specifying at least one triangle to be drawn in accordance with the set of 

edge coefficients, filled with a texture based at least in part on the set of texture coefficients, and z buffered based 
at least in part on the set of z buffer coefficients. 
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119. A process for generating at least one display command for processing by a 3D graphics system, the process 
including the step of generating at least one triangle drawing command including: 

a command identifier field having a six -bit binary value selected from the set of 001010 and 001110, 
a set of edge coefficients, and 
a set of texture coefficients, 

the triangle drawing command format specifying at least one triangle to be drawn in accordance with the set of 
edge coefficients, and filled with a texture based at least in part on the set of texture coefficients. 

120. A system for generating at least one display command for processing by a 3D graphics system, the system in- 
cluding; 

at least one processor, 
'5 at least one memory, and 

circuitry coupled to the processor and to the memory tor generating at least one triangle drawing command 
including: 

a command identifier field having a six -bit binary value selected from the set of 001 01 0 and 001110, 
a set of edge coefficients, and 
20 a set of texture coefficients, 

the triangle drawing command format specifying at least one triangle to be drawn in accordance with the set of 
edge coefficients, and filled with a texture based at least in part on the set of texture coefficients. 

25 121. In a 3D graphics system, a process for executing at least one triangle drawing command including: 

(a) interpreting a command identifier field having a six-bit binary value selected, from the set of 001010 and 
001110, 

(b) interpreting a set of edge coefficients, 

30 (c) interpreting a set of texture coefficients, and 

(d) drawing at least one triangle in accordance with the set of edge coefficients, and filled with a texture based 
at least in part on the set of texture coefficients. 

122. A 3D graphics system for executing at least one triangle drawing command having a command identifier field 
3s having a six-bit binary value selected from the set of 001010 and 001110, the system including: 

a command identifier field decoder that interprets a command identifier field having a six-bit binary value 
selected from the set of 001010 and 001 110, 
a rasterizer that interprets a set of edge coefficients, 
40 a texture coordinate unit that interprets a set of texture coefficients, and 

a display processor that draws. at least one triangle in accordance with the set of edge coefficients, and filled 
with a texture based at least in part on the set of texture coefficients. 

123. A storage medium for use wrth a 3D graphics system, the storage medium for storing at least one triangle drawing 
45 command including: 

a command identifier field having a six-bit binary value selected from the set of 001 010 and 001110, 
set of edge coefficients, and 
a set of texture coefficients, 

so 

the triangle drawing command format specifying at least one triangle to be drawn in accordance With the set of 
edge coefficients, and filled with a texture based at least in part on the set of texture coefficients. 

124. A process for generating at least one display command for processing by a 3D graphics system, the process 
ss including the step of generating at least one triangle drawing command including: 

a command identifier field having a six-bit binary value of a value selected from the set of 001001, 
a set of edge coefficients, and 
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a set of z buffer coefficients, 

the triangle drawing command format specifying at least one non-shaded triangle to be drawn in accordance with 
the set of edge coefficients and z buffered based at least in part on the set of z buffer coefficients. 

5 

125. A system for providing at feast one display command for processing by a 3D graphics system, the system including: 

at least one processor, 
at least one memory, and 

10 circuitry coupled to the processor and to the memory for providing at least one triangle drawing command 

including: 

a command identifier field having a six-bit binary value of a value selected from the set of 001001 , 
a set of edge coefficients, and 
a set of z buffer coefficients, 

75 

the triangle drawing command format specifying at least one non-shaded triangle to be drawn in accordance with 
the set of edge coefficients and z buffered based at least in part on the set of z buffer coefficients. 

126.ln a 3D graphics system, a process for executing at least one triangle drawing command including: 

20 

(a) interpreting a command identifier field having a six-bit binary value of a value selected from the set of 
001001, 

(b) interpreting a set of edge coefficients, 

(c) interpreting a set of z buffer coefficients, and 

2S (d) drawing at least one non-shaded triangle in accordance with the set of edge coefficients and z buffered 

based at least in part on the set of z buffer coefficients. 

127. A 3D graphics system for executing at least one triangle drawing command having a command identifier field 
having a six-bit binary value of a value selected from the set of 001001 , the system including- 

30 

a decoder that interprets a command identifier field having a six-bit binary value of a value selected from the 
set of 001001, 

an edge walker that interprets a set of edge coefficients, 
a z buffer controller that interprets a set of z buffer coefficients, and 
3S a display processor that draws at least one non-shaded triangle in accordance with the set of edge coefficients 

and 2 buffered based at least in part on the set of z buffer coefficients. 

128. A storage medium for use with a 3D graphics system, the storage medium storing at least one triangle drawing 
command including: 

40 

a command identifier field having a six-bit binary value of a value selected from the set of 001 001 , 
a set' of edge coefficients, and 
a set of z buffer coefficients, 

45 the triangle drawing command format specifying at least one non -shaded triangle to be drawn in accordance with 

the set of edge coefficients and z buffered based at least in part on the set of z buffer coefficients. 

129. A process for generating at least one display command for processing by a 3D graphics system, the process 
including the step of generating at least one textured rectangle drawing command including: 

so 

a command identifier field having a six-bit binary value within the range of 100100 and 100101, 
at least two x coordinate values, 
at least two y coordinate values, 
a set of texture coefficients, and 
55 a tile descriptor index value, 

the textured rectangle drawing command format specifying at least one rectangle to be drawn in accordance with 
the x and y coordinate values, and filled with a texture based at least in part on the set of texture coefficients and 
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the tile descriptor index value. 

1 30. A system for providing at least one display command for processing by a 3D graphics system, the system including: 

at least one processor, 
at least one memory, and 

means coupled to the processor and to the memory for providing at least one textured rectangle drawing 
command including: 

a command identifier field having a six-bit binary value within the range of 100100 and 100101 , 
at least two x coordinate values, 
at least two y coordinate values, 
a set of texture coefficients, and 
a tile descriptor index value, 

the textured rectangle drawing command format specifying at least one rectangle to be drawn in accordance with 
the x and y coordinate values, and filled with a texture based at least in part on the set of texture coefficients and 
the tile descriptor index value. 

131 .In a 3D graphics system, a process for executing at least one textured rectangle drawing command including: 

(a) interpreting a command identifier field having a six -bit binary value within the range of 1 001 00 and 1 001 01 , 

(b) interpreting at least two x coordinate values, 

(c) interpreting at least two y coordinate values, 

(d) interpreting a set of texture coefficients, 

(c) interpreting a tile descriptor index value, and 

(f) rendering at least one rectangle in accordance with the x and y coordinate values, and filled with a texture 
based at least in part on the set of texture coefficients and the tile descriptor index value. 

132. A 3D graphics system for executing at least one textured rectangle drawing command having a command identifier 
field having a six-bit binary value within the range of 100100 and 100101 , the system including: 

a decoder that interprets a command identifier field having a six-bit binary value within the range of 100100 
and 100101, 

a processor that interprets at least two x coordinate values and at least two y coordinate values, 
a texture coordinate unit that interprets a set of texture coefficients : and a tile descriptor index value, and 
a display processor that renders at least one rectangle in accordance with the x and y coordinate values, and 
filled with a texture based at least in part on the set of texture coefficients and the tile descriptor index value. 

1 33. A storage medium for use with a 3D graphics system, the storage medium for storing at least one textured rectangle 
drawing command including: 

a command identifier field having a six-bit binary value within the range of 100100 and 100101, 
at least two x coordinate values, 
at least two y coordinate values, 
a set of texture coefficients, and 
a tile descriptor index value, 

the textured rectangle drawing command format specifying at least one rectangle to be drawn in accordance with 
the x and y coordinate values, and filled with a texture based at least in part on the set of texture coefficients and 
the tile descriptor index value. 

134. A process for generating at least one display command for processing by a 3D graphics system, the process 
including the steps of: 

(a) generating at least one set primitive color command including; 

a command identifier field having a six-bit binary value of 111010, and 
a set of color coordinates; and 
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(b) generating at least one filled rectangle drawing command including: 

a command identifier field having a six-bit binary value of 110110, 
at least two x coordinate values and 
at least two y coordinate values, 

the filled rectangle drawing command format specifying at least one rectangle to be drawn in accordance with the 
x and y coordinate values, and filled with a color based at least in part on the color coordinates. 

135. A system for generating at least one display command for processing by a 3D graphics system, the system in- 
cluding: 

at least one processor, 
at least one memory, and 

means coupled to the processor and to the memory for generating: 

(a) at least one set primitive color command including: 

a command identifier field having a six-bit binary value of 111010, and 
a set of color coordinates; and 

(b) at least one filled rectangle drawing command including: 

a command identifier field having a six-bit binary value of 110110, 
at least two x coordinate values, and 0 
at least two y coordinate values, 

the filled rectangle drawing command format specifying at least one rectangle to be drawn in accordance with the 
x and y coordinate values, and filled with a color based at least in part on the color coordinates. 

136.ln a 3D graphics system, a process for executing at least one display command for processing by a 3D graphics 
system, the process including the steps of: 

(a) interpreting at least one set primitive color command including: 

a command identifier field having a six-bit binary value of 111010, and 
a set of color coordinates.; 

(b) interpreting at least one filled rectangle drawing command including: 

a command identifier field having a six -bit binary value of 110110, 
at least two x coordinate values, and 
at least two y coordinate values; and 

(c) drawing at least one rectangle in accordance with the x and y coordinate values, and filled with a color 
based at least in part on the color coordinates, 

137.ln a 3D graphics system, an apparatus for executing at least one display command having a command identifier 
field having a six-bit binaty value of 111010, the apparatus including the following structures: 

means for interpreting at least one set primitive color command including: 

a command identifier field having a six-bit binary value of 111010, and 
a set of color coordinates.; 

means for interpreting at least one filled rectangle drawing command including; 

a command identifier field having a six-bit binary value of 110110, 
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15 



at least two x coordinate values, and 
at least two y coordinate values; and 

means for drawing at least one rectangle in accordance with the x and y coordinate values, and filled with a 
color based at least in part on the color coordinates. 

138. A storage medium for use with a 3D graphics system, the storage medium for storing: 

(a) at least one set primitive color command including: 

a command identifier field having a six-bit binary value of 111010, and 
a set of color coordinates; and 

(b) at least one filled rectangle drawing command including: 

a command identifier field having a six-bit binary value of 110110, 
at least two x coordinate values : and 
at least two y coordinate values : 

20 the filled rectangle drawing command format specifying at least one rectangle to be drawn in accordance with the 

x and y coordinate values, and filled with a color based at least in part on the color coordinates. 

139. A process for generating at least one display command for processing by a 3D graphics system, the process 
including the step of generating at least one texture defining command including: 

25 

a command identifier field having a six-bit binary value of a value of 110101, 
an image data format parameter, 
a color element size parameter, 
a tile line size parameter, 
oo a starting texture memory address, 

a tile descriptor index, 
a palette number, 

at least one texture coordinate clamp enable parameter, 
at least one texture coordinate mirror enable parameter, 
35 at least one texture coordinate wrapping/mirroring mask, and 

at least one texture coordinate level of detail shift parameter. 

1 40. A system for providing at least one display command for processing by a 3D graphics system, the system including: 

40 at least one processor, 

at least one memory, and 

means coupled to the processor and to the memory for providing at least one texture defining command in- 
cluding: 

a command identifier field having a six-bit binary value of a value of 110101, 
45 an image data format parameter, 

a color element size parameter, 

a tile line size parameter, 

a starting texture memory address, 

a tile descriptor index, 
so a palette number, 

at least one texture coordinate clamp enable parameter, 

at least one texture coordinate mirror enable parameter, 

at least one texture coordinate wrapping/mirroring mask, and 

at least one texture coordinate level of detail shift parameter 



55 



141 Jn a 3D graphics system, a process for processing at least one texture defining command including: 
(a) interpreting a command identifier field having a six -bit binary value of a value of 110101, 
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(b) interpreting an image data format parameter, 

(c) interpreting a color element size parameter, 

(d) interpreting a tile line size parameter, 

(e) interpreting a startingtexture memory address, 
s (f) interpreting a tile descriptor index, 

(g) interpreting a palette number, 

(h) interpreting at least one texture coordinate clamp enable parameter, 

(i) interpreting at least one texture coordinate mirror enable parameter, 
(j) interpreting at least one texture coordinate wrapping/mirroring mask, 

10 (k) interpreting at least one texture coordinate level of detail shift parameter, and 

(I) generating at least one image based at least in part on the above-mentioned steps. 

142. A 3D graphics apparatus for processing at least one texture defining command having a command identifier field 
having a six-bit binary value of a value of 110101, the apparatus including 

15 

means for interpreting a command identifier field having a six -bit binary value of a value of 110101, 
a circuit that interprets an image data format parameter, a color element size parameter, a tile line size pa- 
rameter, a starting texture memory address, a tile descriptor index, a palette number, at least one texture 
coordinate clamp enable parameter, at least one texture coordinate mirror enable parameter, at least one 
20 texture coordinate wrapping/mirroring mask, and at least one texture coordinate level of detail shift parameter, 

and 

a display processor that generates at least one image based at least in part on the above-mentioned structures. 

143. A storage medium for use with a 3D graphics system, the storage medium for storing at least one texture defining 
2S command including; a command identifier field having a six-bit binary value of a value of 110101 , 

an image data format parameter, 
a color element size parameter, 
a tile line size parameter, 
30 a starting texture memory address, 

a tile descriptor index, 
a palette number, 

at least one texture coordinate clamp enable parameter, 
at least one texture coordinate mirror enable parameter, 
35 at least one texture coordinate wrapping/mirroring mask, and 

at least one texture coordinate level of detail shift parameter. 

144. A process for generating at least one display command for processing by a 3D graphics system, the process 
including the step of generating at least one texture tile command including: 

40 

a command identifier field having a six-bit binary value of a value within the set of 110100 and 110010, 
low and high tile S coordinates, 
low and high tile T coordinates, and 
a tile descriptor index. 

4S 

145. A system for providing at least one display command for processing by a 3D graphics system, the system including; 

at least one processor, 
at least one memory and 

so circuitry coupled to the processor and to the memory for providing at least one texture tile command including; 

a command identifier field having a six-bit binary value of a value within the set of 110100 and 110010, 
low and high tile S coordinates, 
low and high tile T coordinates, and 
a tile descriptor index, 

55 

146.ln a 3D graphics system, a process for executing at least one texture tile command including: 

(a) interpreting a command identifier field having a six -bit binary value of a value within the set of 110100 and 
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110010, 

(b) interpreting low and high tife S coordinates, 

(c) interpreting low and high tile T coordinates, 

(d) interpreting a tile descriptor index, and 

^ (e) generating a display based at least in part on steps (a)-(d). 

147. A 3D graphics system for executing at least one texture tile command having a command identifier field having a 
six-bit binary value of a value within the set of 110100 and 110010, the system including: 

io a decoder that interprets a command identifier field having a six-bit binary value of a value within the set of 

110100 and 110010, 

a texture unit that interprets low and high tile S coordinates, low and high tile T coordinates, a tile descriptor 
index.; and 

a display processor circuit that generates a display based at least in part on the coordinates identifier and 
75 command identifier. 

148. A storage medium for use with a 3D graphics system, the storage medium for storing at least one texture tile 
command including: 

20 a command identifier field having a six -bit binary value of a value within the set of 110100 and 110010, 

low and high tile S coordinates, 
low and high tile T coordinates, and 
a tile descriptor index. 

25 149.A process for generating at least one display command for processing by a 3D graphics system, the process 
including the step of generating at least one texture memory block loading command including: 

a command identifier field having a six-bit binary value of 1 1 001 1 , 
low and high tile S coordinate parameters, 
30 a low tile T coordinate parameter, 

a T increment value, and 
a tile descriptor index. 

150. A system for generating at least one display command for processing by a 3D graphics system, the system in- 
35 eluding: 

at least one processor, 
at least one memory, and 

means coupled to the processor and to the memory for generating at least one texture memory block loading 
40 command including: 

a command identifier field having a six -bit binary value of 1 1 001 1 , 

low and high tile S coordinate parameters, 

a low tile T coordinate parameter 

a T increment value, and 
45 a tile descriptor index. 

151. In a 3D graphics system, a process for executing at least one texture memory block loading command including: 

(a) interpreting a command identifier field having a six-bit binary value of 110011, 
so (b) interpreting low and high tile S coordinate parameters, 

(c) interpreting a low tile T coordinate parameter, 

(d) interpreting a T increment value, 

(e) interpreting a tile descriptor index, and 

(f) displaying at least one textured primitive based at least in part on steps (a)-(e). 

ss 

152.A 3D graphics system for executing at least one texture memory block loading command having a command 
identifier field having a six-bit binary value of 1 1001 1 the system including; 
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means for interpreting a command identifier field having a six -bit binary value of 110011, 
means for interpreting low and high tile S coordinate parameters, 
means for interpreting a low tile T coordinate parameter, 
means for interpreting a T increment value, 
s means for interpreting a tile descriptor index, and 

a display circuit for displaying at least one textured primitive based at least in part on the S and T coordinate 
parameters, the T increment value, and the tile descriptor index. 

153. A storage medium for use with a 3D graphics system, the storage medium for storing at least one texture memory 
?0 block loading command including: 

a command identifier field having a six-bit binary value of 110011, 
low and high tile S coordinate parameters, 
a low tile T coordinate parameter 
is a T increment value, and 

a tile descriptor index. 

154. A process for generating at feast one display command for processing by a 3D graphics system, the process 
including the step of generating at least one load texture look up table command including: 

20 

a command identifier field having a six-bit binary value of 1 1 0000, 
low and high indices into the table, and 
a tile descriptor index. 

2S 1SS. A system for providing at least one display command for processing by a 3D graphics system, the system including: 

at least one processor, 
at least one memory, and 

circuitry coupled to the processor and to the memory for providing at least one load texture look up table 
30 command including: 

a command identifier field having a six-bit binary value of 110000, 
low and high indices into the table, and 
a tile descriptor index. 

25 156. In a 3D graphics system, a process for executing at least one load texture look up table command including: 

(a) interpreting a command identifier field having a six -bit binary value of 110000, 

(b) interpreting low and high indices into the table, 

(c) a tile descriptor index, and 

40 (d) loading a texture look up table into a memory based at least in part on steps (a)-(c). 

157. A 3D graphics system for executing, at least one load texture look up table command having a command identifier 
field having a six-bit binary value of 110000, the system including. 

*5 a texture memory means for interpreting a command identifier field having a six-bit binary value of 1 10000, 

means for interpreting low and high indices into the table, 
means for interpreting a tile descriptor index, and 

means for loading a texture look up table inlo the texture memory based at least in part on the command 
identifier field and the indices. 



158. A storage medium for use with a 3D graphics system, the storage medium for storing at least one load texture 
look up table command including: 

a command identifi r field having a six -bit binary value of 110000, 
low and high indices into the table, and 
a tile descriptor index. 

159. A process for generating at least one display command for processing by a 3D graphics system, the process 
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including the step of generating at least one set color command including: 

a command identifier field having a six-bit binary value within the range cf 110111 to 111011, 
a red component parameter, 
5 a green component parameter, 

a blue component parameter, and 
an alpha component parameter 

160.A process as in claim 156 further including the step of generating an additional level of detail fraction field and an 
10 associated minimum clamp parameter. 

161 .A system for generating at least one display command for processing by a 3D graphics system, the system in- 
cluding: 

is at least one processor, 

at least one memory, and 

circuitry coupled to the processor and to the memory for generating at least one set color command including: 
a command identifier field having a six-bit binary value within the range of 110111 to 111011, 
a red component parameter, 
20 a green component parameter, 

a blue component parameter, and 
an alpha component parameter. 

162. A system as in claim 158 further including circuitry for providing an additional level of detail fraction field and an 
2S associated minimum clamp parameter. 

163.ln a 3D graphics system, a process for interpreting at least one set color command including: 

(a) interpreting a command identifier field having a six -bit binary value within the range of 110111 to 111011, 
^0 (b) interpreting a red component -parameter, 

(c) interpreting a green component parameter, 

(d) interpreting a blue component parameter, 

(e) interpreting an alpha component parameter, and 

(f ) generating an image based at least in part on steps (a)-(e). 

35 

164. A process as in claim 160 further including the step of interpreting an additional level of detail fraction field and an 
associated minimum clamp parameter 

165. A 3D graphics system for interpreting at least one set color command having a command identifier field having a 
40 six-bit binary value within the range of 110111 to 111011, the system including: 

means for interpreting a command identifier field having a six-bit binary value within the range of 1101 11 to 
111011, 

means for interpreting a red component parameter, 
45 means for interpreting a green component parameter, 

means for interpreting a blue component parameter, 
means for interpreting an alpha component parameter, and 

a display circuit that generates an image based at least in part on the parameters. 

50 166.An apparatus as in claim 162 further including means for interpreting an additional level of detail fraction field and 
an associated minimum clamp parameter. 

167. A storage medium for use with a 3D graphics system, the storage medium storing at least one set color command 
including: 

55 

a command identifier field having a six-bit binary value within the range of 110111 to 111011, 
a red component parameter, 
a green component parameter, 
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a blue component parameter, and 
an alpha component parameter 

168. A storage medium as in claim 164 further including means tor storing an additional level ot detail fraction field and 
an associated minimum clamp parameter. 

169. A process for generating at least one display command for processing by a 3D graphics system, the process 
including the step of generating at least one set primitive depth command including: 

a command identifier field having a six -bit binary value of 101110, 
a primitive depth parameter, and 
a primitive delta depth parameter 

170. A system for providing at least one display command for processing by a 3D graphics system, the system including: 

at least one processor, 
at least one memory, and 

means coupled to the processor and to the memory for providing at least one set primitive depth command 
including: 

a command identifier field having a six -bit binary value of 101110, 
a primitive depth parameter, and 
a primitive delta depth parameter. 

171. In a 3D graphics system, a process for processing at least one set primitive depth command induding: 

(a) interpreting a command identifier field having a six-bit binary value of 101110, 

(b) interpreting a primitive depth parameter, 

(c) interpreting a primitive delta depth parameter, and 

(d) generating at least one image based at least in part on steps (a)-(c). 

172. A 3D graphics system for processing at least one set primitive depth command having a command identifier field 
having a six-bit binary value of 101110, the system including: 

means for interpreting a command identifier field having a six -bit binary value of 101110, 

means for interpreting a primitive depth parameter, 

means for interpreting a primitive delta depth parameter, and 

a display circuit that generates at least one image based at least in part on the parameters. 

173. A storage medium for use with a 3D graphics system, the storage medium storing at least one set primitive depth 
command including: 

a command identifier field having a six-bit binary value of 1 0 1 1 1 0, 
a primitive depth parameter, and 
a primitive delta depth parameter 
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